I understand that in the out_features in Linear often lower than the in_features to get more meaningful feature but some time I see the out_features is higher than the in_features, sometimes it’s equal.
For example, like the architecture below in swin transformer v2 in pytorch:
Sequential(
(0): SwinTransformerBlockV2(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): ShiftedWindowAttentionV2(
(qkv): Linear(in_features=768, out_features=2304, bias=True) #Higher
(proj): Linear(in_features=768, out_features=768, bias=True) #Equal
(cpb_mlp): Sequential(
(0): Linear(in_features=2, out_features=512, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=512, out_features=24, bias=False) # Lower
)
I want to ask:
- What are the purposes by having higher, equal, lower out_features in the network?
- Can you provide me some papers about this matter and network architetures having this?
I’m just start learning about deep learning and AI, if you can provide some course about building network, it will be great help.
Thank you so much.