I have a dataset of 1000 labeled images of CAPTCHA from my college website, and I want to train a model that can accurately solve similar CAPTCHAs on unseen data. The CAPTCHA typically consists of 6 alphanumeric characters (A-Z, 0-9). Despite trying several approaches, the model fails to achieve high accuracy.
Dataset:
- Number of Samples: 1000 labeled images.
- CAPTCHA Type: 6 alphanumeric characters.
- Image Example: [Attach an example CAPTCHA image].
- What I’ve Tried:
Convolutional Neural Network (CNN):
I created a CNN model with multiple convolutional and dense layers.
Flattened the output to feed into separate dense layers for each character.
Achieved poor generalization on unseen CAPTCHAs.
Transfer Learning:
Used MobileNetV2 as the feature extractor with a custom head.
Adapted input size (grayscale to RGB conversion).
The model was prone to overfitting due to limited data
I want to train a model that:
- Can handle unseen CAPTCHAs with high accuracy.
- Effectively decodes sequences of alphanumeric characters.
What is the best approach or architecture for solving CAPTCHA like this? Should I use:
- CNN for feature extraction followed by LSTM for sequence decoding?
- A fully convolutional architecture (e.g., CRNN)?
- Any other approach that works better for limited datasets?
If possible, could you suggest a complete model architecture, preprocessing steps, or loss function setup that might help? Any tips for augmenting a small dataset for this task?
Priyanshu Kr. Choubey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.