The Human Genome Project—discovering the human blueprint

Expert reviewers

Essentials

The blueprint for any living organism is contained in its DNA. DNA is a long molecule made up of many smaller units, called nucleotide bases

The order, or sequence, of the bases within the DNA provides the instructions for creating that organism—the genetic code

Functional chunks of DNA with particular combinations of base pairs are called genes

A genome is an organism's complete set of DNA—all of its genes and other non-genic DNA

The human genome is the complete set of instructions required to build a human being

Although every person on our planet is built from the same blueprint, no two people are exactly the same. While we are similar enough to readily distinguish ourselves from other living creatures we also celebrate our individual uniqueness. So what is it that makes us all human, yet unique? Our DNA.

The stuff that makes us who we are

Our DNA (Deoxyribo Nucleic Acid) is found in the nucleus of every cell in our body (apart from red blood cells, which don’t have a nucleus). DNA is a long molecule, made up of lots of smaller units. To make a DNA molecule you need:

If you take one of the four nitrogenous bases, and put it together with a sugar molecule and a phosphate molecule, you get a nucleotide base. The sugar and phosphate molecules connect the nucleotide bases together to form a single strand of DNA.

Two of these strands then wind around each other, making the twisted ladder shape of the DNA double helix. The nucleotide bases pair up to make rungs of the ladder, and the sugar and phosphate molecules make the sides. The bases pair up together in specific combinations: A always pairs with T, and C always pairs with G to make base pairs.

Put three billion of these base pairs together in the right order, and you have a complete set of human DNA—the human genome. This amounts to a DNA molecule about a metre long.

It’s the order in which the base pairs are arranged—their sequence—in our DNA that provides the blueprint for all living things and makes us what we are. The DNA sequence of the base pairs in a fish’s DNA is different to those in a monkey.

The base pair sequence of all people is nearly identical—that’s what makes us all humans. However, there are small differences in the order of the three billion base pairs in everyone’s DNA that cause the variations we see in hair colour, eye colour, nose shape etc. No two people have exactly the same DNA sequence (except for identical twins, because they came from a single egg that split into two, forming two copies of the same DNA).

We get our DNA from our parents. The DNA of the human genome is broken up into 23 pairs of chromosomes (46 in total). We receive 23 from our mother and 23 from our father. Egg and sperm cells have only one copy of each chromosome so that when they come together to form a baby, the baby has the normal 2 copies.

Three billion is a lot of cats to herd

Three billion is a lot of base pairs, and together they contain an enormous amount of information. If they were all written out as a list, they would fill around 10,000 epic fantasy-novel-sized (think Game of Thrones thickness) books. They aren’t just random lists of information though. Rather, within this long string, there are distinct sections of DNA that affect a particular characteristic or condition. These stretches of DNA are known as genes. Their base pair sequence is used to create the amino acids that join together to make a protein. Some genes are small, only around 300 base pairs, and others contain over one million.

Genes make up only around 1.5 per cent of our DNA—the rest is extra that initially didn’t appear to have any specific purpose, and was dubbed ‘junk DNA’. Turns out, though, that at least some of this ‘junk’ is actually pretty useful—it’s used to define where some genes start and finish, and to regulate how the genes behave. While most of the junk DNA comes from copies of virus genomes that invaded our distant ancestors, new studies suggest much of this DNA may have also gained functions during our evolution.

Genes contain information to make proteins

Within a gene, the base pairs are read in sets of three and those sets are called codons. These are triplets of base pairs that provide a ‘code’ for the production of a particular amino acid. Amino acids then combine together to build proteins. Proteins build all living structures as well as acting as catalysts (enzymes) that control biochemical reactions. Proteins build tissues, and tissues build the organs that make up our body. The genes that determine that you will have brown eyes contain instructions for the cells in the iris of your eye to make a brown-coloured protein. A different sequence of bases would spell a different message, making different proteins and give blue eyes—rather like spelling out a different sentence using the same letters of the alphabet.

Genes can be switched on or off

So if every cell in our body contains the same DNA, how do we end up with the complex arrangement of different cells that is the human (or any other creature’s, for that matter) body?

The secret is that although every cell contains the same sequence of genes, not every gene is ‘switched on’ or expressed in every cell. The cells that make pigment in the eye also contain the genes for making tooth enamel or liver cell proteins, but fortunately don’t do so because those genes are inactive in the eye cells. There are stretches of DNA that do not code for proteins, but rather act as the ‘punctuation’ within the genome that controls the functioning of genes and other processes.

It’s all of this—the genes plus the 'punctuation' plus the ‘junk’—that makes up our genome.

Why study our genome?

Working out the sequence of the base pairs in all our genes enables us to understand the code that makes us who we are. This knowledge can then give us clues on how we develop as embryos, why humans have more brainpower than other animals and plants, and what happens in the body to cause cancer. But establishing the sequence of three billion base pairs is a BIG task. The great and ambitious research program that sought to do this was called the Human Genome Project.

The idea of the Human Genome Project was born in the 1970s, when scientists learned how to ‘clone’ small bits of DNA, around the size of a gene. To clone DNA, scientists cut out a fragment of human DNA from the long strand and then incorporate it into the genome of a bacteria, or a bacterial virus. The fragment is then is replicated within the bacterial cell many times and every time the bacterial cell divides, the new cells also contain the introduced DNA fragment. Bacterial cells reproduce prolifically, and so this process ends up making millions of cells that all contain the introduced DNA fragment, enough that researchers can study it in detail and figure out the sequence of the base pairs.

With time, researchers have been able to study an ever greater number of different DNA fragments, that is, different genes. It became clear that certain variant DNA sequences were associated with particular conditions: diseases such as cystic fibrosis or breast cancer, or normal, non-harmful variants like red hair.

There was initially a lot of opposition to the Human Genome Project, even from some scientists. Considering only around 1.5 per cent of our genome is actual genes that code for proteins, it was thought that much of the $3 billion cost to sequence the entire human genome would be wasted on the ‘junk’ DNA that scientists thought didn’t get used. The important role the ‘junk’ DNA plays in gene regulation wasn’t yet appreciated.

Research groups in many countries, including Australia, began to sequence different genes, providing the beginnings of a total human gene map. In 1989, the Human Genome Organisation (HUGO) was founded by leading scientists to coordinate the massive international effort involved in collecting sequence data to unravel the secrets of our genes.

Francis Collins, former director of the National Human Genome Research Institute, led the Human Genome Project. Image credit: World Economic Forum on Flickr.

The Human Genome Project

So complex that at first it seemed unachievable

The Human Genome Project aimed to map the entire genome, including the position of every human gene along the DNA strand, and then to determine the sequence of each gene’s base pairs. At the time, sequencing even a small gene could take months, so this was seen as a stupendous and very costly undertaking. Fortunately, biotechnology was advancing rapidly, and by the time the project was finishing it was possible to sequence the DNA of a gene in a few hours. Even so, the project took ten years to complete; the first draft of the human genome was announced in June 2000.

Humans surprisingly simple?

In February 2001, the publicly funded Human Genome Project and the private company Celera both announced that they had mapped virtually all of the human genome, and had begun the task of working out the functions of the many new genes that were identified. Scientists were surprised to find that humans only have around 25,000 genes, not much more than the roundworm Caenorhabditis elegans, and fewer than a tiny water crustacean called Daphnia, which has around 30,000. However, genome sequencing was making it clear that an organism's complexity is not necessarily related to its number f genes.

Also, while we might have a surprisingly small number of genes, they are often expressed in multiple and complex ways. Numerous genes have as many as a dozen different functions and may be translated into several different versions active in different tissues. We also have a lot of extra DNA that doesn’t make up specific genes. So even though the puffer fish Tetraodon nigroviridis has more genes than we do—nearly 28,000—the size of its entire genome is actually only around one tenth of ours as it has much less of the non-coding DNA.

In April 2003, the 50th anniversary of the publication of the structure of DNA, the complete final map of the Human Genome was announced. The DNA from a large number of donors, women and men from different nations and of different races, contributed to this ‘typical’ Human Genome Sequence.

Of the 25,000 or so human genes that have been identified as coding for proteins, most exist in several sequence variants, called alleles. Sometimes these variations are harmless. The gene that codes for eye colour has several alleles—one for blue eyes, another for brown eyes. Sometimes these genetic variations can cause a disease. For example, a mutation in the gene that transports ions across the membrane of lung cells can cause cystic fibrosis.

So, although our alleles may be different, all humans mostly share the same genes. The Human Genome Project identified the full set of human genes, sequenced them all, and identified some of the alleles, particularly those that can cause disease when they get mutated.

Genes can be mapped relative to physical features of the chromosome, or relative to other genes. When different genes are close together on the same chromosome they are said to be linked because they will usually be passed on together (‘co-inherited’) to a child. However, chromosomes break and re-join when eggs and sperm are formed (‘meiosis’), so even genes that are close together can sometimes become separated. The closer together the genes are, the more likely they are to stay together. Analysing how often genes become separated from each other can help establish the distance between genes, and produce a genetic linkage map. In the Human Genome Project, the first task was to make a genetic linkage map for each chromosome.

A genetic linkage map is made from studying patterns in gene separation, and shows the relative locations of genes on a chromosome. It does not tell us anything about the actual physical distances between the genes. A physical map, made by hybridizing a fluorescent-tagged probe to chromosomes, can be aligned with the linkage map. Molecular scale maps can be constructed from sequence markers in the DNA molecule, and quantifies these distances, usually in terms of how many base pairs there are between genes. Together, the genetic linkage map, the physical map, molecular maps and sequence give us the complete picture of the genome.

It's all about me

It’s all very nice to make a map of these three billion pairs and figure out how they all fit together in order to understand the fundamental essence of a human being. But how much significance does this have for our everyday lives?

Actually, quite a lot. As the cost of sequencing a genome plummets—the first human genome sequenced in 2003 cost somewhere in the order of US$2.7 billion, while it can now be done for less than US$1,000—doctors have a new and extremely powerful tool at their disposal. Identifying how our genes interact and which parts of our genome affect certain diseases and conditions has meant that doctors and scientists are able to better understand how these conditions work and how to treat them. Combine this with the exact knowledge of a particular person’s genes and their mutations, and we are embarking on a new age of personalised medicine.

Doctors can tailor a patient’s medical treatment to be an exact fit, the way a tailor adjusts a suit or a dress for each individual. Drug treatments can be developed that are based on specific genetic mutations, and doctors may be able to diagnose a disease in a patient who is not showing typical symptoms. Scientists anticipate that soon we will move from a “one drug fits all” style of treatment to a more effective, highly personalised, targeted approach. For example, based on a patient’s genome, doctors may be able to predict if they will respond to certain cancer therapies. This can help avoid putting the patient through devastating chemotherapy treatments unnecessarily.

Mapping an individual’s genome can also provide doctors with the ability to predict or anticipate any diseases that the individual may be predisposed to. These conditions could then be addressed with a preventative approach, before they take serious hold.

Ethical controversies

There is no doubt that information from the Human Genome Project provides huge benefits to human health in helping to understand and treat genetic diseases (such as breast cancer, cystic fibrosis and sickle cell anaemia). However, some people see ethical issues, and wonder if scientists are “playing God” with our genomes.

Could genetic information be misused; for example, through genetic discrimination by employers or insurance companies? Most people agree that gene testing can be used ethically to prevent serious diseases such as cancer, or during pregnancy to avoid the birth of someone with a severe handicap, but should we allow gene testing to choose a child who will be able to be better at sports, or more intelligent? What about sex selection, already a problem in some countries? And will it become possible to use genetic information to change genes in children or adults for the better? Do we really want to know if we run the risk of developing a particular disease that may or may not be treatable? What are the privacy issues regarding genome screening on a population scale?

All of these ethical, legal and social issues associated with genetic information are being considered worldwide by scientists and ethicists. The potential for medical advancement is immense, but as with so many other great scientific advances, new knowledge brings huge new responsibilities.