I’m trying to replicate in pytorch the proposed fully convolutional network in the paper “Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline” from 2016.
-
The paper states the following:
” In our problem settings, the FCN is performed as a feature extractor. Its final output still comes from the softmax layer. The basic block is a convolutional layer followed by a batch normalization layer [15] and a ReLU activation layer. The convolution operation is fulfilled by three 1-D kernels with the sizes {8,5,3} without striding. The basic convolution block is
y=W⊗x+b s = BN(y)
h = ReLU(s) (2)
⊗ is the convolution operator. We build the final networks by stacking three convolution blocks with the filter sizes {128, 256, 128} in each block. Unlike the MCNN and MC-CNN, We exclude any pooling operation. This strategy is also adopted in the ResNet [16] as to prevent overfitting. Batch normalization is applied to speed up the convergence speed and help improve generalization. After the convolution blocks, the features are fed into a global average pooling layer [17] instead of a fully
connected layer, which largely reduces the number of weights. The final label is produced by a softmax layer (Figure 1(b)).” -
The figure 1 being:
figure 1 from paper
As I understand the model architecture is the following:
*Notes:
- the n_samples depend on the padding, lets assume “same”*
input time series shape[n_channels,n_samples]
↓
{ (Basic convolutional block 1)
conv1d(k = 8, out_ch = 128)
↓
batch_norm()
↓
ReLu()
} -> shape[128,n_samples]
↓
{ (Basic convolutional block 2)
conv1d(k = 5, out_ch = 256)
↓
batch_norm()
↓
ReLu()
} -> shape[256,n_samples]
↓
{ (Basic convolutional block 3)
conv1d(k = 3, out_ch = 128)
↓
batch_norm()
↓
ReLu()
} -> shape[128,n_samples]
↓
Global average pooling 1d/2d?
Using 1d average pooling along time shape[128,n_samples] -> shape[128,3]
Using 1d average pooling along channels shape[128,n_samples] -> shape[3,n_samples]
Using 2d average pooling one could directtly do shape[128,n_samples] -> shape[3] which is what we want for the logits.
But in an online implementation in pytorch of this exact model I found them using a 1d average pooling and then a FC to get the n_classes logits dimension, which as I understand is explicitelly denied by on the paper? “After the convolution blocks, the features are fed into a global average pooling layer [17] instead of a fully
connected layer, which largely reduces the number of weights. The final label is produced by a softmax layer (Figure 1(b)).”
Here is the implementantion I found in the tsai library:
Implementation source:
class FCN(Module):
def __init__(self, c_in, c_out, layers=[128, 256, 128], kss=[7, 5, 3]):
assert len(layers) == len(kss)
self.convblock1 = ConvBlock(c_in, layers[0], kss[0])
self.convblock2 = ConvBlock(layers[0], layers[1], kss[1])
self.convblock3 = ConvBlock(layers[1], layers[2], kss[2])
self.gap = GAP1d(1)
self.fc = nn.Linear(layers[-1], c_out)
def forward(self, x):
x = self.convblock1(x)
x = self.convblock2(x)
x = self.convblock3(x)
x = self.gap(x)
return self.fc(x)
I can see the use of a FC layer in the tsai implementation of the model but the paper states its not used to generate the logits? I doubt the implementation is wrong as this is a very basic model from 2016.
Then, is it that in the paper they are only saying they are not using a FC directly with the output of the convolutional blocks and that there is a parameter saving in that but then you just assume that with any classification problem you always use a FC to create the logits? But why is not showing in the figure then before the softmax? Is there here some convention that I’m missing?
Thank you in advance ppl for reading this post and helping out to understand 🙂
Haradai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.