I’m trying to use the BOVW model for image classification (https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision)
Here’s what I’ve done so far:
- got an array of labeled numpy arrays by loading images out of my dataset
- turned the images into lists of their features using SIFT
- created a vocabulary of visual words by putting all the features into a 1D array and feeding them to kmeans clustering
Here’s how I create the vocabulary
def get_vocab(imgs):
#get features from all images
descriptors = [f for img in imgs for f in extractFeatures(img)]
km = KMeans(n_clusters).fit(descriptors)
return km.cluster_centers_
if I understood the model correctly, the next step is to compute a frequency histogram, which is an array storing the count for every visual word occurring in the image.
My question are:
-
how do I “count” the visual words occuring inside an image?
do I take a feature (a numpy array) and find the visual word closest to it, then add to that count? -
how do I find the “closest” visual word to any given feature?
-
can I “reuse” the data from the clustering algorithm (i.e. find which cluster a feature has been put into)?