I was training a simple LSTM neural network with pytorch to predict stock price. And it is confusing to me that my network wouldn’t fit. The loss is exploding and the r2 is negative. As the training process goes, nothing is improved. There should be a fatal mistake in my code. But I tried many ways and I still cannot work out the solution.
Here is my code:
class LSTMModel(nn.Module):
def __init__(self, features):
super(LSTMModel, self).__init__()
self.lstm1 = nn.LSTM(input_size=features, hidden_size=16, batch_first=True)
self.dense2 = nn.Linear(16, 1)
self._init_weights()
def forward(self, x):
x, _ = self.lstm1(x)
# x, _ = self.lstm2(x)
# Flatten the output for Dense layer input
x = x[:, -1, :]
# x = self.dense1(x)
x = self.dense2(x)
return x
def _init_weights(self):
for name, param in self.named_parameters():
if 'weight' in name:
nn.init.xavier_uniform_(param)
elif 'bias' in name:
nn.init.zeros_(param)
# Initialize the model
model = LSTMModel(len(feature_cols))
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95)
def train_model(num_epochs):
for epoch in range(num_epochs):
model.train()
total_loss = 0
for data, target in train_loader:
optimizer.zero_grad()
output = model(data)
loss = criterion(output.reshape(len(output), ), target)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
total_loss += loss.item()
scheduler.step()
model.eval()
val_loss = 0
pred = []
with torch.no_grad():
for data, target in test_loader:
output = model(data)
# print(data, output.reshape(len(output), ), target)
val_loss += criterion(output.reshape(len(output), ), target).item()
pred += list(output.reshape(len(output), ))
val_loss /= len(test_loader)
r2 = r2_score(test_y, pred)
print(f'Epoch {epoch + 1}, Train Loss: {total_loss / len(train_loader)}, Val Loss: {val_loss}, val r2: {r2}')
I have tried:
- clip gradient, as shown in the code. didn’t work.
- change batch size. didn’t work.
- print the weights of the network. And I found that the weights of LSTM layers are all very close to 0, while the weights of the dense layers are normal. In ther words, I have encountered a gradient vanishing problem
- initialize weights, as shown in the code. didn’t work.
- change model hyperparameters. I have tried to change hidden layers, learning rate, hidden_size… They didn’t work.
- change the number of input features. didn’t work.
- change the sliding window of the time series data. didn’t work.
Some notes:
- I have applied MinMaxScaler to my feature priorier to data input.
- The dataset contains roughly 4000 observations.