How Does DNA Work?

Today we're taking a conceptual overview of the amazing facility of DNA: how DNA works, what it looks like, and how it replicates and builds proteins in your cells.

DNA is a chemical molecule that forms the basis of all life on Earth. It stands for deoxyribonucleic acid and is comprised of repeated building blocks: nucleic acids, ribose sugars, and phosphate groups.

On the surface, DNA is a simple and beautiful molecule. Yet it creates the complexity of all living things including plants, bacteria, fungi and animals.

The process of evolution has created some mind-boggling systems to record DNA's critical data, replicate it in all living cells, and execute that code in order to create and sustain life.

What is DNA?

So DNA is a self-replicating material inside every single cell of your body. Think of it like a recipe book for cooking up a human being from scratch.

Not just any human being, of course, but specifically you, since your DNA is unique to you. With the exception of any clones you forgot to mention. Or twins, which are nature's clones. But that's really all.

Inside the recipe book are various chapters which are like chromosomes. Humans have 46 chromosomes, half of which come from your mother, and half from your father.

It turns out that chromosome number isn't due to the organism's complexity - but the pathway in their DNA's evolution.

Over many generations, chromosomes go through many duplications and mutations. And because the whole process of mutation is random, nobody's doing any housekeeping.

So after millions of years, we end up with a fair bit of redundancy with unused stretches of DNA.

But back to the DNA recipe book. If chromosomes are chapters, then genes are individual recipes in each chapter.

While it's possible to have single gene traits (like whether or not you have long eyelashes, or a chin fissure, or brown eyes), many traits are the result of multiple gene interactions (such as your intelligence, or your coat colour if you're a Golden Retriever).

We'll look at how genes are expressed in a moment. First let's take a quick look at how these concepts link together in reality.

How DNA, Chromosomes and Genes Fit Together

Take a moment to visualise the process of thousands of genes, coiling into 46 chromosomes, forming your complete set of DNA.

Starting at the smallest scale, we have genes contained in long stretches of the double helix.

The helix is then wound around molecules called histones to form neat little packages called nucleosomes.

The condensing strand coils and twists further into increasingly large bundles to form supercoiled chromatin fibres, and the supercoils fold into loops which wind even further until we have a chromosome.

Collectively, all of this is referred to as DNA, with a single molecule of DNA being a chromosome.

This bundling and coiling is an incredibly fine-tuned process, with different coiling styles in different cells, appropriate to local gene expression. In other words, while every cell in your body contains the genes that code for eye colour, the DNA in iris cells is coiled in such a way as to leave the pigment genes in more accessible parts of the chromosome, ready for expression.

And this coiling process happens a lot.

Every time a cell in your body divides (every few hours or days, depending on the cell's job) all 46 chromosomes necessarily uncoil into vast stretches of DNA, exposing all the inside ladder rungs to replicate, then coil back up in a highly specific manner to rest as neat little packages once again.

Extreme Close-Up: The Structure of DNA

From now on, we're going to zoom in and look at how DNA works very closely indeed.

The double helix shape of DNA was deduced by the (then) young scientists Watson and Crick in 1953. Prior to their discovery, no-one knew what shape DNA took. Some thought it was a single strand; others worked on the hypothesis of a triple helix.

It was a lucky coincidence that Watson and Crick were handed Rosalind Franklin's x-ray crystallography images of DNA, and subsequently intuited the double-helix shape. To prove their hypothesis, they built an oversized real-world model with ball and stick molecules to demonstrate how all the pieces fit together. Their DNA model worked perfectly.

So we tend to think of DNA as a jigsaw puzzle, except there are only six different shapes and 18 billion pieces overall.

The six shapes to consider are the DNA bases (adenine, thymine, guanine and cytosine) and the DNA backbone (sugar rings and phosphate groups).

Because of their chemical structure and physical attractions, the bases and backbone are compelled to join together like a ladder. Since molecules are rule-bound, we find this beautiful order emerging out of apparent chaos.

How DNA Bases Create Genes

Imagine yourself in a landscape with a train track stretching out behind you and in front of you, all the way into the distance.

The longitudinal rails are like the sugar-phosphate backbone of DNA, while the latitudinal sleepers are like the complementary base pairs (always matching as A=T and G=C).

As you walk along the train track, you can count off the sleepers for miles and miles.

The average gene is about 27,000 base pairs long, although some are as long as two million base pairs, which is quite a lot of train track. If you were to examine your entire genome (all your DNA) you would have to walk along three billion sleepers (or six billion bases in total).

Surprisingly, not every sleeper is considered to be useful, coding DNA. Far from it.

In humans, only 2% of our DNA actually codes for proteins, which is the whole purpose of DNA. The other 98% - called non-coding DNA or junk DNA - is a mixed bag of knowns and unknowns.

While some stretches have been identified as regulatory DNA (telling the DNA how to replicate itself) and viral DNA (inserted into your germ line throughout ancient history) much of it is still a complete mystery.

How Does DNA Work?

Now we have the structural intricacies covered, how does DNA work?

Recall that DNA is like a cook book. Chromosomes are like individual chapters. And genes are like individual recipes, most of which don't actually make pie or soup or apple crumble but are relics redundant to your current meal plan.

In this analogy, who's the cook? Who's mixing the ingredients? And what do they make?

This is called gene expression, and it's happening all the time in your body.

Genes are expressed on-demand and there's a great deal of variety going on during rapid growth phases like embryonic development and puberty.

But it's also essential for bog standard daily living too, such as producing insulin to "unlock" your cells and allow them to take up sugar from your food.

Such responses are regulated on a real-time basis, underpinning the life-sustaining nature of your DNA.

Gene expression takes place in three stages, known as the Central Dogma because it's so very important to how DNA works.

Step 1. Transcription (Photocopying The Recipe)

The first stage is transcription, where the helical DNA unwinds and copies itself into a single-stranded molecule called RNA.

This RNA stuff is pretty similar to DNA with some key molecular differences, such as all the thymine bases being converted to uracil.

This stage is like taking a photocopy of a recipe to work with. Imagine your kitchen is messy and you often set things on fire and you want to keep your very important recipe book super nice and clean for future use. Especially seeing as you like to cook thousands of times a day and there's no way to buy a new cookbook if you ruin this one.

During transcription, a large protein molecule called RNA polymerase works its way along the DNA helix, teasing apart the two strands with the help of its enzyme pals.

The single RNA strand (the photocopy) is then assembled from free-floating DNA bases in the surrounding goo, attracted to their complementary bases without much fuss.

It's all very elegant.

Step 2. RNA Processing (Customising the Recipe)

The second stage is called RNA processing, where the new strand of RNA goes through some modifications.

Here, various little enzymes come along and attach a cap and tail to the strand of RNA, which determine how long it should live.

Other enzymes (called spliceosomes) chop out non-coding base pair sequences of the gene (introns) and leave behind only about 1,200 out of 27,000 base pairs to be expressed (exons) in a process known as alternative splicing.

There is a pretty spectacular bit of biology going on here. The end product of gene expression (a protein) depends on which exons are spliced out at the RNA processing stage.

It means that a single gene can be expressed in numerous different ways, which is super efficient.

It's ironic then that so much of DNA is non-coding junk, which is super inefficient. Damn you, nature.

Step 3. Translation (Following The Recipe)

The third stage of gene expression is called translation because there is a change of language: from bases to amino acids.

The spliced RNA strand pops out of the cell nucleus and into the cell cytoplasm. Here, it attaches to a ribosome, a molecular complex that reads the base pair sequence in groups of three (called codons).

The transfer RNA molecules have an anti-codon on one end and its amino acid counterpart on the other. Thus, the ribosome translates the sequence of codons into amino acids.

What emerges is a long chain of amino acids, also known as a polypeptide: the foundation of a protein.

It's purple because proteins are always purple in biology textbooks. I don't know who started that but it's a nice bit of synaesthesia to help you learn your molecular groupings. Incidentally, if you aren't using colours in your notes, it's never too late to start.

Several amino acid chains fold together to form a protein, which is the ultimate product of gene expression. This is how DNA works moment to moment inside every cell in your body.

The Genetic Code

"Tell me more about the codons!" I hear you scream. And you'd be right. This is a good thing to scream about, if anything is.

The relationship between codons and amino acids is defined in the genetic code, which is universal to all life forms on Earth.

For instance, the base sequence C-G-C codes for the amino acid arginine. The sequence A-T-G codes for methionine (as well as being an instruction to start building the polypeptide). There are three separate codons which tell the ribosome it's reached the end and should stop coding.

Here's the complete code if you're interested.

You'll note there are 64 possible base pair combinations (4 x 4 x 4) and only 20 amino acids. So there are multiple codons for the same amino acid, creating a fair bit of redundancy in the genetic code.

This may well be a very good thing, however, because it dampens the effects of mutations (a switch from G-T-T to G-T-C still codes for valine, for example) which is good for preventing inherited diseases. Check out my post on animal evolution for more on how mutations can be good, bad, or neutral.

As amazing as this naturally-occurring complexity may sound, it's also really cool how scientists have figured all this out and translated it into comprehensible terms that non molecular biologists can understand. This makes me very happy.

What Are Proteins For?

Now we can symbolically understand how DNA works to create proteins - what's next? What do the proteins do?

Proteins are large complex molecules with myriad essential roles in the body. So once the desired protein is cooked up, it's released by the cell to fulfil its destiny inside the body.

That might be haemoglobin bound for a red blood cell, for example, which is what carries oxygen around the body. It's kinda important.

See What Happened to Gene Therapy? for an overview of how gene therapy works to combat faulty critical DNA code and where clinical trials stand today.

Sometimes proteins are retained inside the very cell that made them, because cells need proteins too. Some of the elements described above fall into this category, such as the spliceosomes used in RNA processing, or the ribosomes used in translation.

All of this takes place at an astonishing rate. A single ribosome can produce dozens of polypeptide chains every second.

And there are up to 10 million ribosomes in a typical body cell, enabling for mass translation.

A single cell can therefore throw out vast numbers of protein molecules when it needs to, and does so alongside thousands of other cells at once.

And that's pretty much how DNA works. It's an extraordinary complex choreography of biological molecules culminating in the basic normal functioning of a living organism - such as a friendly newt or toad. Isn't that brilliant?