In order to allow my CNN to detect objects in high resolution images without losing speed or using a lot of computational resources, I thought of this process:
The high-resolution image is given to a CNN, which I’ll call CNN-A. CNN-A condenses the image and performs with a larger kernel. Where it flags a match, the detection’s bounding boxes are translated to the original image, and that cropped image is sent to CNN-B to be examined with a smaller kernel.
My hope is that this will have computational efficiency benefits by reducing the amount of data processed at each stage while maintaining accurate detection results. However, I’m very new when it comes to all of this Convolutional Neural Network stuff. Does anyone know if this could be a viable approach?