Clustering Using OPTICS

It also cannot be smaller than the core distance of o.Although the MinPts parameter is used in these calculations, the idea is that it would not have much of an impact because all distances would scale at the roughly the same rate.We will use these definitions to create our reachability plot, which will then be used to extract the clusters..First, we start out by calculating the core distances on all data points in the set..Then we will loop through the entire data set, and update the reachability distances, processing each point only once..We will only update reachability distances of points that stand to be improved, and have not yet been processed..This is because when we process a point, we set in stone its ordering as well as its reachability distance..The next data point chosen to process will be that which has the closest reachability distance..This is how the algorithm keeps clusters near each other in the output ordering..An example of the raw reachability plot is shown below.The next step will be to extract the actual cluster labels from the plot..The most common way of doing this is by searching for “valleys” in the plot, using local minimums and maximum..A few more parameters could come into play here, depending on the method taken.See below for a comparison of some generated sample data and the resulting optics labels and reachability plot..Colored points are those identified as clusters, while grey ones represent noise.This example was taken directly from the Scikit-Learn development versionNotice that there are a good amount of points identified as noise points in this generated example..They have similar densities to that of the yellow cluster, but are not recognized in this extraction because it focuses on separating the denser regions instead.. More details