This course introduces you to the basic biology of modern genomics and the experimental tools that we use to measure it. We'll introduce the Central Dogma of Molecular Biology and cover how next-generation sequencing can be used to measure DNA, RNA, and epigenetic patterns. You'll also get an introduction to the key concepts in computing and data science that you'll need to understand how data from next-generation sequencing experiments are generated and analyzed.
This is the first course in the Genomic Data Science Specialization.

Reviews

NL

Great introduction to Genomics. The instructors really laid out the practical terms and gave great resources. Weeks 3 and 4 ramp up in difficulty, so some outside research in statistics is helpful.

GH

Oct 06, 2019

Filled StarFilled StarFilled StarFilled StarFilled Star

This course provides a gentle introduction to the concepts and terminology associated with genomics. As someone who never took a biology course in high school or university, this course is perfect.

From the lesson

Overview

In this Module, you can expect to study topics of "Just enough molecular biology", "The genome", "Writing a DNA sequence", "Central dogma", "Transcription", "Translation", and "DNA structure and modifications".

Taught By

Steven Salzberg, PhD

Jeff Leek, PhD

Transcript

This lecture is about important molecules of molecular biology. And I am only going to talk about few key molecules. Because they're the ones that we talk about throughout the course. And its important that you know their names and you know what they are. There are thousands of molecules, that we could talk about. They are all important, so I don't mean to imply by saying important molecules, that these are the only important molecules in molecular biology. But they're among the most critical important and critically important molecules in determining how your genome functions. And if you, if you get confused or you worry whether or you wonder whether this is that important, just remember that every one of these molecules is in every cell in your body. And if you had a powerful enough microscope, you could take a little, you could just point it at your skin right now and zoom in, you could see all these molecules I'm going to talk about. So DNA is the molecule that comprises all of our genetic material and it's comprised of four different nucleotides. AGCT is how we write them, but they're actually named adenine, guanine, cytosine and thymine and biochemically they have slightly different structures. Adenine and guanine are called purines and they have a two ring structure shown here. And thymine and cytosine are pyrimidines and they have a one ring structure. And you don't need to know this structure or remember this. But just you just its good to know that the C and T are kind of similar to one another and a little smaller. And the A and G are similar to one another and a little bigger. So the way the DNA is constructed is that these molecules bind together in a very specific way. A's always bind to T's and G's always bind to C's. And this is true across all living things, everybody has the same DNA and we all have the same structured DNA. So this rule is very useful because it means, and this was discovered way back in, in the 1950s when Watson and Crick discovered the structure of DNA. They immediately realized that this binding property means that when you have one of the strands, you've already you also know what the other is. So if I give you one of the strands of DNA with ACG and T, you know what the other strand is. Because everywhere there's an A on one strand, there must be a T on the other strand. And, and corresponding everywhere, there's a G on one strand, there must be a C on the other strand. So, this provides a mechanism, as, as Watson famously observed when they discovered the structure, for how DNA copies itself and passes itself on, from one generation to another. So, the structured DNA then, in these long strings, gets put together in this, famous double helix structure. So the molecules, of the As and, bind to Ts and the Gs bind to Cs and you've built these long ladders. And the ladders are twisted in a long helix. And every one of your cells has all of your DNA in it in the structures. And as, as we've said before, the DNA in our, in our genome is organized into 23 chromosome pairs. Each of these chromosomes is a very, very long string like this. The longest chromosomes in the human genome are the order of 250 million nucleotides long. So there extremely long molecules very tightly coiled up and packed together inside of the nucleus of every cell in your body. So the way that we're going to write the data the DNA sequence itself looks like this. So we'll write As Cs Gs and Ts. We don't write these chemical structures. We abbreviate with these four letters. And DNA actually has a direction, a strandedness. And we call and we call this the based on the biochemical properties one end of the DNA is the Phi-prime end and the other is the A three prime end. And that has to do with the structures of those biochemical molecules, but you don't really need to remember that. Just remember that we always write it in the same direction where five prime is always first and three prime is always second. And we try to write the opposite strand the other way around. So the, the strand that goes five prime to three pri, to three prime, because we're writing things that way, we call that the positive or plus strand. And the other strand, the reverse complement, is the negative strand. So, the other, another critical molecule for how our bodies work and how our genomes work is RNA. So RNA is almost exactly like DNA except for a couple of important differences. The, the most obvious difference is that we, we have, we don't have the T anymore. We don't have thymine instead we have uracil. So when DNA gets copied or transcribed into RNA, the As get replaced by As, Gs get replaced by Gs, Cs get replaced by Cs, but Ts get replaced by Us or uracil. And then you build an R molecule which, unlike DNA, is single stranded. So RNA are not double stranded although they can form double stranded complexes. But in general RNA is single stranded and it is from this RNA template that we create proteins. Which are the other critical molecule in a cell. So RNA has, again, the same, similar biochemical structures. The, the uracil or the U molecule is very similar to the T molecule. So we write RNA the same way we write DNA, five prime to three prime, only we're going to replace all the Ts with Us. So if you see a string of, of letters and there are Us in it instead of Ts, you know immediately that that's RNA, not DNA. So and important distinction genetically is that DNA is the stuff of inheritance. DNA is what cells carry with them from one cell generation to another whenever a cell divides, it creates DNA that replicates the DNA in the original cell. The RNA uses a template to make proteins, but the RNA is not actually the stuff of inheritance. However, it is in most cases an identical copy of the DNA, that it was formed from. So we use these, these molecules to encode how the cell works, DNA is basically a program, that we read out. And the read out program starts with RNA and then it goes to make, and the RNA is used to make proteins. So proteins that are also long molecules, not nearly as long as DNA. They're typically hundreds or sometimes thousands of amino acids long. So amino acids are more complicated molecules that there's a picture of one here that are strung together as well to make proteins. And the translation rule we the, the translation rules were worked out in the 1960s. In some groundbreaking molecular biology work related to, at the, very founding of the field of molecular biology. Scientists figured out essentially one codon at a time how you translate RNA into proteins. So I just use, use the word codons. So the way that RNA gets turned into proteins is that the every combination of three of the letters of RNA encodes it an amino acid. So there's, there are 64 possible such combinations. Of those 64 combinations, 61 of them encode amino acids, and three of them are stop codons. So the translation machinery reads along the RNA molecule three nucleotides at a time. And for every three nucleotides, it creates an amino acid. And those are built together into a long string of amino acids which is what we call a protein. And when the RNA translation machinery hits one of these stop codons, one of these three special codons that don't encode anything, it stops and that's the end of the protein. So that's how it knows. So we write protein sequences also in a particular direction. And here's just showing a translation of a particular set of nine nucleotides producing three amino acids. We write proteins in, in the 20 letter alphabet we use to abbreviate the amino, amino acids. There are 20 amino acids that comprise essentially all of our proteins. So actually to tell the whole story, there are more than 20 amino acids, there's actually 22. The 21st amino acid was discovered not that long ago, and the 22nd amino acid also not that long ago. And these amino acids are primarily used in other forms of life besides humans. So there are a few exceptions to almost every rule in biology. But in general, the way to think about human biology is that we have 64 possible codons. 61 of them encode amino acids and they encode exactly 20 amino acids. That 21st amino acid, when it was discovered, turned out to be, one of the stop codons that's once in a while used to encode amino acid.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.