I would like to analyse a set of hundreds of thousands of product images (clothing, electronic goods etc) and retrieve the dominant colours in each. I’m only interested in the top 3 or 4 colours. The aim is to achieve a degree of certainty that x image is mostly red or image y is mostly orange and blue.
The images are likely to be colour jpegs of reasonable quality and approximately 100kb in size.
I would like to use C# and the solution should run on a Linux server, preferably using open source libraries. What image processing algorithms or techniques might help me achieve this?
1
The general method for this sort of thing is called an “Image Histogram”. You generate a set of color values and a count of each of those values. In your particular case, you’d want a “Color Histogram”.
Your chief challenge will be less a matter of actual collection and more one of segmentation as the definitions of various colors vary. So you’d need to decide what values “red” is for instance. From there it’s easy enough to use a sorting algorithm to determine the majority colors, if any. You’d thus want to use a ordered collection unless you want to use Moore’s voting algorithm instead. If you want more than one color, you’d have to remove each majority color and run the algorithm again. It also can’t recognize tied colors in one pass.