I am working with vision transformers (ViT) for the task of image segmentation, but I am unsure of which segmentation head to use.
I know I need a vision transformer as my backbone, and a segmentation head to generate the image segmentation from the learned representations of the backbone given an input image. Can I use any segmentation head with a ViT backbone, or do certain heads apply to specific ViT backbones?
Appreciate anyone who can offer some insight!
Alex is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.