SIFT/SURF BoW for big number of clusters

If you spend some time browsing, there are some examples already available for Python SIFT/SURF bag of words (BoW) classifier in the internet. They use clustering (usually K-Means) to build dictionary of visual vocabularies (usually with sklearn or cv2 clustering library) of SIFT/SURF features. However, most of the sample codes that I found can’t properly handle big number(> 100) of vocabularies/clusters, while some papers (such as this one) shows best result are achieved using 2000+ clusters.

Building visual dictionary using cv2.BOWKMeansTrainer is super slow when using > 100 clusters. While using sklearn.cluster.KMeans solves the speed issue, it requires huge amount of memory (8 GB of RAM is still insufficient to handle > 400 clusters). That’s where klearn.cluster.MiniBatchKMeans comes into picture.

Using the code above, I was able to complete whole process of raw data (2000+ images) preprocessing, building SIFT dictionary, cross-validating 6 classifiers within 52 minutes (still quite long, but acceptable 😜). While using SURF, it takes around 45 minutes. The complete main files for both experiments can be found in here and here.