An increasing volume of data is becoming available in biomedicine and healthcare, from genomic data, to electronic patient records and data collected by wearable devices. Recent advances in data science are transforming the life sciences, leading to precision medicine and stratified healthcare.
In this course, you will learn about some of the different types of data and computational methods involved in stratified healthcare and precision medicine. You will have a hands-on experience of working with such data. And you will learn from leaders in the field about successful case studies.
Topics include: (i) Sequence Processing, (ii) Image Analysis, (iii) Network Modelling, (iv) Probabilistic Modelling, (v) Machine Learning, (vi) Natural Language Processing, (vii) Process Modelling and (viii) Graph Data.
Watch the course promo video here: http://edin.ac/2pn350P

SJ

One of the great course I have ever done through coursera! Thank you to Dr.Wong and Dr.Areti for their great contribution!

KA

Jun 27, 2019

Filled StarFilled StarFilled StarFilled StarFilled Star

A very knowledgeable course that gives insight into the upcoming advances in healthcare with the help of data science.

从本节课中

WELCOME TO WEEK 2

This week you will be introduced to Sequence Processing and Medical Image Analysis. Explore the course materials to find out about recent advances in these areas and how they contribute to Precision Medicine!

教学方

Dr Areti Manataki

Teaching and Research Fellow

Dr Frances Wong

Data Science MOOC Project Lead

脚本

Hello, Tim. Please tell us a bit about yourself and your current role with Edinburgh Genomics. My name is Tim Aitman. I'm professor of Molecular Pathology and Genetics in the University of Edinburgh. And for the last 20 years or so, I've been researching into the genetics of rare and common diseases. In particularly, in the last 10 years, I have been using information and technology about the genome to give insights in those fields. When I came to Edinburgh, three and a half years ago, I was asked to set up a genome sequencing facility at Edinburgh Genomics, and as a result of that, I'm the Clinical Director of the whole genome sequencing facility at Edinburgh Genomics. Edinburgh Genomics is based on two campuses. One of them at King's Buildings, and that has a wide range of sequencing technologies and applications, and the second one is at Easter Bush. This is my whole genome sequencing facility, and this is mainly concerned with sequencing whole genomes. Please tell us about the Edinburgh Genomic facility, and what services it can provide. So, the services at Edinburgh Genomics clinical, are to sequence whole genomes. That's the main activity of the facility there. And this could be animal genomes, for example, in the context of agriculture, such as sheep, or pigs, or chickens, or it could be humans. And about two thirds of the genomes that have been sequenced are human genomes. That's in relation to human genetics research, and in relation to healthcare in the Scottish NHS. What cutting-edge technologies have Edinburgh Genomics recently acquired? So, in order to try and give perspective on what Edinburgh Genomics, the clinical facility does, I think, I'd just take us back a few years. So, the first human genome sequence was completed in 2001. And it was completed at a cost of about £2 billion, and took 15 years to produce. So, things have changed rather markedly since then. And in our sequencing facility at Easter Bush, we can sequence a genome in two to three days. It costs under £1,000. And in fact, since the facility started sequencing about 18 months ago, we have sequenced 8,000 genomes. So, that is an extraordinary revolution, and it is this change in cutting-edge genome technology that has allowed us to do that. And of course, a human genome is a lot of information. It's 3,000 million letters or base pairs as we call them. Those are usually covered 30 to 60 times when one sequences a genome. And this is why we have a requirement for analyzing and storing terabytes of data. And in fact, the computational side is almost as important as the sequencing technology, and we have a compute facility where we regularly store about two petabytes of data, and we can expand that fairly easily to five petabytes. This is quite a substantial computing facility, and is one of the key reasons why the facility has been successful. How many genomes are sequenced per week at Edinburgh Genomics? So, in terms of the throughput of the facility, we have a capacity for sequencing several hundred genomes per week. We usually are sequencing around 150 to 200 genomes per week. I should explain how that can happen, because as you can imagine, sequencing a genome is not a trivial or straightforward task. Particularly, it's remarkable when you think that we only have three technicians or research assistants in the labs who carry out the lab work. And the reason that they can do that is because firstly, we have a remarkably efficient laboratory information management system. Quite costly, but it means that we track the samples through from the very beginning to the end, and each time a new procedure or a new person is involved with handling that sample, it talks to the laboratory information management system, and tells the system where the sample is. And, in addition to that, every sample is handled almost exclusively by robots. So, the robots obviously are programmed by humans. But because it's robotic, it can be very high throughput. And so, we can easily handle 8, 16, or up to 48 samples at a time, meaning that we can have this very high throughput with very few people in the lab. What are the main challenges in genome sequencing and analyzing sequence data? What are the main challenges in analyzing human sequence data? Well, part of this relates to the volume of the data itself, because we cover each part of the genome 30 times, and there are, as mentioned, three billion base pairs in the genome. And so, each of those segments is sequenced in small pieces. And each of those small pieces goes into individual computer files. So, we have millions of computer files as the immediate output of the sequencer. Those have to then be joined up, and we join them up by mapping them to a reference genome, and that is done by now fairly standard software, and those reads are mapped to the genome, piled up, and then compared to the reference genome. When we find differences between that individual's genome and the reference genome, we call that a sequence variant. And there are around four million variants in any one individual's genome compared to the reference. And then, we can say are any of those significant? How do we assess their significance? We look to see whether they're common in populations, which part of the genome they are in. For example, are they in a gene that codes for a protein or are they in regulatory DNA between genes. And if they're in a gene itself, what are the predicted effects on the protein structure and activity? And this is where the skill is derived because we then have to decide how significant that is for the protein function. And for example, if we have an individual with a genetic disease or disorder, then we can say, could any one of those four million variants or the smaller number that are within genes for example, could they be responsible for that individual's disorder. And therefore could we make a definite genetic diagnosis from that person's genome sequence. How has genome sequencing helped in unraveling the genetics of both common and rare human disorders? Well, genome technology has really revolutionized our ability to find the causes of genetic disorders. If we look at the first part of that, there are around 7,000 genes that cause human genetic disorders. And 20 or 30 years ago, we really knew almost none of those. Now, we know about 4,000 of those so probably over half, and most of those have been found because we've been able to sequence the genomes of people with a particular disorder, look for variants or mutations that are in common between individuals with that disorder, including individuals in the same family, and then say, "Do we know the cause?" So, that's the first part of it, that new genome technology has helped us to establish the cause of thousands of genetic disorders. And then, the second part is, if we have an individual with a genetic disorder, can we track down the cause? And once again, genome technology really now is the key to the extent that if an individual presents with a presumed genetic disorder, many people would believe, and increasingly it is becoming so, that the way to find out the cause of their disease is simply by sequencing their whole genome, analyzing it for what variation they have in their genome and saying, can we find what is often the single letter or single base pair change that causes that individual's disease. This is quite a remarkable change when one considers that the first genome sequence was only available just about 15 years ago. What do you see for the future of genomics? So, the future of genomics, that's a fascinating question that geneticists, genomicists and social scientists all have a different perspective on. Just this week, I proposed the motion in the annual Edinburgh medical debate that the UK population should have genome screening. And what we agreed was that, genome sequencing was almost certainly accepted by the audience and the speakers to be the method of choice for diagnosing genetic disorders. And so, when one sequences a genome, the results are not completely certain. It may come up with a definitive diagnosis. But because there are four million variants, we often don't know what quite a number of them do. In fact, the overwhelming majority we won't know what they do. And in addition, just because we're looking maybe for the cause of someone's genetic disorder, we will find other variants that might be responsible for other genetic disorders. And so, if one's going to sequence someone's genome, even in the context of a disorder, one needs to have consented that individual, and inform them about what the results are likely to be, both for the information about their own disorder, if they have one, and for any other information that comes about. And those things need to be very tightly regulated, and individuals need to be consented, and informed about what the possible results might be, and what implications they might have, for example, for their insurance and their future health care. So, I think that in the context of genetic disorders, most people agree that genetic testing is usually beneficial. But then the question arises, if we can sequence a genome for under £1000 and give people their results back, what about people who don't have an obvious genetic disorder? Maybe they have a family tendency to more common diseases such as diabetes or heart disease or psychiatric disease. Can we, perhaps, predict, or give them some indication as to whether they are likely to be predisposed to these disorders? And the answer is yes. Although we can't do it very precisely, and the information that we provide people with, might give them 1.5 or two fold increased risk, as opposed to the single gene disorders where if they have the gene, they are almost certain to get the disorder, such as Angelina Jolie found with breast cancer, and decided to take action by having mastectomies and ovariectomies. So, there is information that healthy people may derive about their future health care, and the question is, is that beneficial? It's certainly technically possible, £1,000, and a few days of a sequencer and a scientists to analyze it, it's definitely possible. But it does raise the question, should we be sequencing babies at birth? Should we be sequencing children so that they know what the future's going to hold for them? And very strongly I think, most people believe that we should not be sequencing genomes of people who cannot consent, which means that babies and children under the age of consent should not have their genome sequenced. But maybe later in their life, there was a view quite evenly balanced in the medical debate that people will benefit from having their genome sequenced. In the next five years, it's likely that worldwide, between 50 and 500 million genomes will be sequenced. And so the question is, who should be doing it? Who should have their genome sequenced? Who should be regulating it, if any, or should people just be able to choose to have their own genome sequenced? I think that there are risks of just allowing people to have their genome sequenced without any regulation or counseling. But I think there are also advantages. And although there will be costs to the healthcare system, in the long term, I believe it will make us a healthier nation, and better informed about our futures, and better able to make sure that if there is predisposition to disease, we have the opportunity for early and the best treatment. Thank you very much for joining us today, Tim. It's a pleasure.