Given a set of images on a webpage, what’s a good way to pick a compelling image that is also representative of the webpage? The use case is displaying an image along with a description of a URL to a page, after a user includes it in a status update.
I’ve found some techniques to find the most interesting part of a single image. Example: http://berk.es/smartcropper/
So maybe I can use this notion of entropy to compare images in a set?
I am programming in ruby, so a pure ruby solution is preferable, but am open to others.
7
This is just a quick hash of ideas. I don’t have any implementations to share.
Best advice: use side-channel information:
- Relative position (layout) of the image on the page.
- Look at the scripts attached to that image. Is that clickable (onclick)?
- Does it have a title?
- Does it have a title that shares important words with the title of the page?
Not-so-best advice: use image information.
Not-so-best because these are known difficult open-ended problems, so a non-scientist will find it painful to implement or even use.
- Images with beautiful histogram (e.g. suns, beaches, blue sky)
- … any of the research publications mentioned by others in the comments.
COTS (on-the-shelf) solutions:
- Images with a face (mugshot). Search for “OpenCV face recognition Ruby” and you’ll find some.
- Instructions: http://www.sitepoint.com/detecting-faces-with-ruby-ffi-in-a-nutshell/ (No comments on this since I’m not a Ruby programmer)
Advice for the true hackers.
https://harthur.github.io/brain/
This is a submission (as seen on Hacker News) that trains a simple neural network (3 input nodes, 3 nodes in a middle layer) based on whether the user prefers black text or white text against a RGB background.
A similar do-it-yourself approach can be applied to an image histogram picker. However, we wouldn’t know how useful the end result would be for your applications.