I’m trying to figure out the best way to recommend images based on past classifications using k-means clusters. What I have done is mapped the RGB values of a set of images, performed a k-means cluster analysis on those RGB values, and attached a “rating” to each image. This has created Voronoi cells similar to this graph. I’ve stored the cluster centers and ratings into my “training set”.
The next step is to take a new image and make a recommendation based on the training set previous images. I’m not sure how to proceed. Would I want to implement a Collaborative Filtering process? Or do I need to perform more processing on the data?
Not sure if it matters but I’m using Apache Spark for the project. Thanks!
Edit: Collaborative filtering is probably not the best way to proceed, since the features being compared for products uses more than just ratings. I need to compare the similarity. I’m guessing this would involve heavy matrix operations?
Edit2: Some feedback here would be awesome. What I’m thinking is training two datasets (a rating of “yes” images vs. “no” images) and then using Spark’s computeCost
function to develop a value for variance/bias of an image being compared. The final step would then compare whether the image was more similar to the “yes” dataset or “no” dataset, and then make a final recommendation. I’m new to machine learning, so I could be over-thinking it.
2
Thought I’d answer my own question here with my final solution. I realized that although I could get some partially meaningful data from k-means clustering of RGB values, the problem was that the recommendation was not based on meaningful characteristics of the image. It could potentially be useful in the future for other aspects (such as lighting or “brightness”).
The final solution was using EigenFaces for image learning. The article below was the most helpful for me to understand the basics and get going:
Generating EigenFaces with Mahout SVD to recognize person faces