I am trying to understand the architecture of Yolo. This image shows the transition of the image through the different convolution layers. I understand the transition of the first layer. How does a 448X448X3 image with a 7X7X3S2 kernel get an output of 112X112X192.
But from the second layer to the end of the architecture I did not understand, how the image gets its size after the convolution?
I I would appreciate it if someone could explain to me how the architecture works
איתן פרנסא is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.