I want to use machine learning to identify defects on clothing. I’ve assembled a dataset of roughly 1000 images per class (clean, stained, torn, pet hair, etc.). My initial approach was to fine-tune a Convolutional Neural Network (CNN) model, specifically InceptionResNetV3. This worked well overall, but the model still struggles with very small details. Also, sometimes it confuses shadows with stains or wrinkles.
Here’s what I’m looking for:
Improved Performance: How can I further enhance the model’s accuracy, especially with small defects?
Alternative Approaches: Are there other ML techniques or model architectures you would recommend for this task? Would annotating each image with bounding boxes and using a object detection approach work better? Or can using ViTs help?
Note: It’s crucial that the solution works effectively with high-resolution images and accurately captures small details.
Here are two images to explain better what I’m trying to achieve;
- Here I want my model to classify this t-shirt as stained, because of the stain close to the neck part.
Link to the t-shirt image - Here I want my model to classify this t-shirt as torn, because of the small hole close to the waist level;
Link to the t-shirt image
To explain what I have done so far;
- To prevent the possible confusion caused by the background, I removed the backgrounds from all the training images using “rembg” library.
- I augmented the images twice to obtain around 3000 images per class.
- I fine-tuned the network I mentioned with an input size of 1200×1600 pixels & trained the last 50 layers only.
- For the inference I split the original input image (typically around 3000×4000 pixels) into four, and fed them separately to the model.
Berk Ali Çam is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.