This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of. We’ll investigate the different steps required to master Big Data analytics on real datasets, including Next Generation Sequencing data, in a healthcare and biological context, from preparing data for analysis to completing the analysis, interpreting the results, visualizing them, and sharing the results.
Needless to say, when you master these high-demand skills, you will be well positioned to apply for or move to positions in biomedical data analytics and bioinformatics. No matter what your skill levels are in biomedical or technical areas, you will gain highly valuable new or sharpened skills that will make you stand-out as a professional and want to dive even deeper in biomedical Big Data. It is my hope that this course will spark your interest in the vast possibilities offered by publicly available Big Data to better understand, prevent, and treat diseases.

Taught By

Isabelle Bichindaritz

Associate Professor

Transcript

In this lesson, you're going to know how to reduce data. In this course, we are looking at the data potentially. Very often it's going to be important to test the different workflows that we have, and different data analytics methods that we're going to use on just a subset of the data. So actually data reduction can take several forms. One that is used very frequently is called feature selection, where we're going to select a subset of all the features available in a big data set. Sampling is when we're going to select a subset of the rows in the data set. When you do sampling generally it's for later on, to apply the same data analysis on the larger sample on the whole data set, to see whether the results hold on the larger data set. Data compression can be used particularly to store data temporarily, you may want to compress them. Data aggregation also may be combining some data. You may combine different, for example features into one. And there are many other methods for data reduction. Now a short lesson on feature selection. Feature selection Is also called dimensionality reduction. It's very important for biomedical data, where you often have a lot of features in your data set. This large number of features may come, for example, from genetic data, gene expression data. If we have 20,000 gene expressions to consider, this is called highly dimensional. And this is also referred to as a curse of dimensionality. You also may have data that repeat over time, even if you're working with electronic medical record. So you're going to have data at some point in time that sometimes repeated several times a day, like type of a intensive care unit type of environments. And even over time, in the life of a patient, you may have data that repeats themselves quite often if you look at the blood pressure, or the temperature, and so forth, vital signs. So, there are a lot of applications that are highly dimensional, means that they have a lot of features. And often the number of rows are going to be smaller than the number of features. This is more what refers as the cause of dimensionality. And we need feature selection to deal with the situation, and because as importance of this topic, it will be addressed independently in a future module.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.