Fuzzy logic principles can be used to cluster multidimensional data, assigning
each point a membership in each cluster center from 0 to 100 percent. This
can be very powerful compared to traditional hard-thresholded clustering where
every point is assigned a crisp, exact label.

Fuzzy c-means clustering is accomplished via skfuzzy.cmeans, and the
output from this function can be repurposed to classify new data according to
the calculated clusters (also known as prediction) via
skfuzzy.cmeans_predict

Above is our test data. We see three distinct blobs. However, what would happen
if we didn’t know how many clusters we should expect? Perhaps if the data were
not so clearly clustered?

Let’s try clustering our data several times, with between 2 and 9 clusters.

# Set up the loop and plotfig1,axes1=plt.subplots(3,3,figsize=(8,8))alldata=np.vstack((xpts,ypts))fpcs=[]forncenters,axinenumerate(axes1.reshape(-1),2):cntr,u,u0,d,jm,p,fpc=fuzz.cluster.cmeans(alldata,ncenters,2,error=0.005,maxiter=1000,init=None)# Store fpc values for laterfpcs.append(fpc)# Plot assigned clusters, for each data point in training setcluster_membership=np.argmax(u,axis=0)forjinrange(ncenters):ax.plot(xpts[cluster_membership==j],ypts[cluster_membership==j],'.',color=colors[j])# Mark the center of each fuzzy clusterforptincntr:ax.plot(pt[0],pt[1],'rs')ax.set_title('Centers = {0}; FPC = {1:.2f}'.format(ncenters,fpc))ax.axis('off')fig1.tight_layout()

The FPC is defined on the range from 0 to 1, with 1 being best. It is a metric
which tells us how cleanly our data is described by a certain model. Next we
will cluster our set of data - which we know has three clusters - several
times, with between 2 and 9 clusters. We will then show the results of the
clustering, and plot the fuzzy partition coefficient. When the FPC is
maximized, our data is described best.

Finally, we generate uniformly sampled data over this field and classify it
via cmeans_predict, incorporating it into the pre-existing model.

# Generate uniformly sampled data spread across the range [0, 10] in x and ynewdata=np.random.uniform(0,1,(1100,2))*10# Predict new cluster membership with `cmeans_predict` as well as# `cntr` from the 3-cluster modelu,u0,d,jm,p,fpc=fuzz.cluster.cmeans_predict(newdata.T,cntr,2,error=0.005,maxiter=1000)# Plot the classified uniform data. Note for visualization the maximum# membership value has been taken at each point (i.e. these are hardened,# not fuzzy results visualized) but the full fuzzy result is the output# from cmeans_predict.cluster_membership=np.argmax(u,axis=0)# Hardening for visualizationfig3,ax3=plt.subplots()ax3.set_title('Random points classifed according to known centers')forjinrange(3):ax3.plot(newdata[cluster_membership==j,0],newdata[cluster_membership==j,1],'o',label='series '+str(j))ax3.legend()plt.show()