I’m working on a deep learning model that involves a bidirectional GRU (bi-GRU) followed by an Encoder Transformer. My input time series has the shape (batch_size, seq_len, num_features), where num_features is 3. The bi-GRU processes this input, and its output is fed into the Encoder Transformer.
The challenge I’m facing is how to design a suitable regression head after the Encoder Transformer. This regression head should take the transformer’s output and generate a 2D vector of 4 values as predictions.
Specifically, I’m looking for guidance on the following:
Architecture: What would be an effective architecture for this regression head?
Implementation: How can I implement this regression head in a way that seamlessly integrates with my existing bi-GRU and Encoder Transformer components?
I have implemented the bi-GRU and Encoder Transformer components of my model. The bi-GRU successfully processes the input time series, and the Encoder Transformer further refines the representations.
I’ve tried two different approaches for the regression head:
Linear Layer: I added a simple linear layer with four output units to the output of the Encoder Transformer. The expectation was that this layer would learn to map the transformer’s high-level features to the four target values.
Multi-Layer Perceptron (MLP): I replaced the linear layer with a small MLP (e.g., two hidden layers with ReLU activation) to increase the complexity of the regression head and potentially improve its capacity to model non-linear relationships.
However, I’m unsure whether these approaches are appropriate or if there are better alternatives. I’m hoping to get feedback on the suitability of these regression heads and any potential improvements.
Those two implementations are my regression heads:
self.linear_relu_stack = nn.Sequential(
nn.Linear(self.bidirectional * self.hidden_size, 256),
nn.ReLU(),
nn.Linear(256, 64),
nn.ReLU(),
nn.Linear(64, self.num_features)
)
The second implementation is the following (which I don’t really understand):
self.head_nf = self.seq_len * d_model
if custom_head is not None:
if isinstance(custom_head, nn.Module): self.head = custom_head
else: self.head = custom_head(d_model, self.num_features, self.seq_len)
else:
self.head = self.create_head(self.head_nf, self.num_features, act=act, fc_dropout=fc_dropout, y_range=y_range)
def create_head(self, nf, num_features, act="gelu", fc_dropout=0., y_range=None):
layers = [get_activation_fn(act), nn.Flatten()]
if fc_dropout: layers += [nn.Dropout(fc_dropout)]
layers += [nn.Linear(nf, num_features)]
if y_range: layers += [nn.SigmoidRange(*y_range)]
return nn.Sequential(*layers)
Alexandre Benoit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.