This course dives into the basics of machine learning using an approachable, and well-known programming language, Python.
In this course, we will be reviewing two main components:
First, you will be learning about the purpose of Machine Learning and where it applies to the real world.
Second, you will get a general overview of Machine Learning topics such as supervised vs unsupervised learning, model evaluation, and Machine Learning algorithms.
In this course, you practice with real-life examples of Machine learning and see how it affects society in ways you may not have guessed!
By just putting in a few hours a week for the next few weeks, this is what you’ll get.
1) New skills to add to your resume, such as regression, classification, clustering, sci-kit learn and SciPy
2) New projects that you can add to your portfolio, including cancer detection, predicting economic trends, predicting customer churn, recommendation engines, and many more.
3) And a certificate in machine learning to prove your competency, and share it anywhere you like online or offline, such as LinkedIn profiles and social media.
If you choose to take this course and earn the Coursera course certificate, you will also earn an IBM digital badge upon successful completion of the course.
LIMITED TIME OFFER: Subscription is only $39 USD per month for access to graded materials and a certificate.

AJ

This was a very informative course. The videos provided a good background on the concepts and I found the labs especially helpful for learning to implement Python code for each technique covered.

JJ

Dec 06, 2018

Filled StarFilled StarFilled StarFilled StarFilled Star

I am happy to have this online education, I drop out my nuclear engineering degree, I am happy to learn practical things with future... I work for IBM also...but I want to become a data scientis

從本節課中

Clustering

In this section, you will learn about different clustering approaches. You learn how to use clustering for customer segmentation, grouping same vehicles, and also clustering of weather stations. You understand 3 main types of clustering, including Partitioned-based Clustering, Hierarchical Clustering, and Density-based Clustering.

教學方

SAEED AGHABOZORGI

Ph.D., Sr. Data Scientist

腳本

Hello and welcome. In this video, we'll be covering DB scan. A density-based clustering algorithm which is appropriate to use when examining spatial data. So let's get started. Most of the traditional clustering techniques such as K-Means, hierarchical, and Fuzzy clustering can be used to group data in an unsupervised way. However, when applied to tasks with arbitrary shaped clusters or clusters within clusters, traditional techniques might not be able to achieve good results that is, elements in the same cluster might not share enough similarity or the performance may be poor. Additionally, while partitioning based algorithms such as K-Means may be easy to understand and implement in practice, the algorithm has no notion of outliers that is, all points are assigned to a cluster even if they do not belong in any. In the domain of anomaly detection, this causes problems as anomalous points will be assigned to the same cluster as normal data points. The anomalous points pull the cluster centroid towards them making it harder to classify them as anomalous points. In contrast, density-based clustering locates regions of high density that are separated from one another by regions of low density. Density in this context is defined as the number of points within a specified radius. A specific and very popular type of density-based clustering is DBSCAN. DBSCAN is particularly effective for tasks like class identification on a spatial context. The wonderful attributes of the DBSCAN algorithm is that it can find out any arbitrary shaped cluster without getting effected by noise. For example, this map shows the location of weather stations in Canada. DBSCAN can be used here to find the group of stations which show the same weather condition. As you can see, it not only finds different arbitrary shaped clusters it can find the denser part of data-centered samples by ignoring less dense areas or noises. Now, let's look at this clustering algorithm to see how it works. DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. This technique is one of the most common clustering algorithms which works based on density of object. DBSCAN works on the idea that if a particular point belongs to a cluster it should be near to lots of other points in that cluster. It works based on two parameters; radius and minimum points. R determines a specified radius that if it includes enough points within it, we call it a dense area. M determines the minimum number of data points we want in a neighborhood to define a cluster. Let's define radius as two units. For the sake of simplicity, assume it has radius of two centimeters around a point of interest. Also, let's set the minimum point or M to be six points including the point of interest. To see how DBSCAN works, we have to determine the type of points. Each point in our dataset can be either a core, border, or outlier point. Don't worry, I'll explain what these points are in a moment. But the whole idea behind the DBSCAN algorithm is to visit each point and find its type first, then we group points as clusters based on their types. Let's pick a point randomly. First, we check to see whether it's a core data point. So, what is a core point? A data point is a core point if within our neighborhood of the point there are at least M points. For example, as there are six points in the two centimeter neighbor of the red point, we mark this point as a core point. Okay, what happens if it's not a core point? Let's look at another point. Is this point a core point? No. As you can see, there are only five points in this neighborhood including the yellow point, so what kind of point is this one? In fact, it is a border point. What is a border point? A data point is a border point if A; its neighbourhood contains less than M data points or B; it is reachable from some core point. Here, reachability means it is within our distance from a core point. It means that even though the yellow point is within the two centimeter neighborhood of the red point, it is not by itself a core point because it does not have at least six points in its neighborhood. We continue with the next point. As you can see, it is also a core point and all points around it which are not core points are border points. Next core point and next core point. Let's pick this point. You can see it is not a core point nor is it a border point. So, we'd label it as an outlier. What is an outlier? An outlier is a point that is not a core point and also is not close enough to be reachable from a core point. We continue and visit all the points in the dataset and label them as either core, border, or outlier. The next step is to connect core points that are neighbors and put them in the same cluster. So, a cluster is formed as at least one core point plus all reachable core points plus all their borders. It's simply shapes all the clusters and finds outliers as well. Let's review this one more time to see why DBSCAN is cool. DBSCAN can find arbitrarily shaped clusters. It can even find a cluster completely surrounded by a different cluster. DBSCAN has a notion of noise and is robust to outliers. On top of that, DBSCAN makes it very practical for use in many real-world problems because it does not require one to specify the number of clusters such as K in K-means. This concludes this video. Thanks for watching.