I have many images around 200 thousands and they are being more each day. There are some duplicate in this images. Some of images are excatly same and some of them are cropped, rotated, shifted in x or y axis and so on. I would like to detect the duplicated images. Duplicate means that a part of two images are exactly same, I dont mean different images with different camera view angle, different distance between image and objects. If the images are duplicated, then they are exactly same or their parts match no mater they are rotated, cropped or other things. I added four example images to below.
Please give me some suggestions to solve this problem. You may correct me if I done something is wrong with the algorithms that I used.
Here what I tried to detect:
Hashing algorithms
Hashing algorithms work well if images are excatly same. However, if the image is cropped, then hashing algorithms are not able to consider them as duplicate. Here hash algorithms that I used
Deep learning algorithms
Deep learning algorithms are probably most used techniques in the literature. However, they are not sufficient for my problem since deep learning algorithms take care contextual informations too. For example, deep learning algorithms find high similarity for two different electricity distrubition transformer images. For example, when I extract embeddings of first and fourth images and calculate cosine similarity between them, I got 0.63 similarity score.
Template matching
Template matching works fine but I think it is expensive and I am not sure how to select which image or part of image will be template.
Histogram matching
I split the images into patches and calculated histogram for each patches. Then calculated a histogram similarity between these patches to match image parts. However, it did not give good result.