So I was reading up on RNNs in a book called “Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow” by Aurelien Geron. I came across a very simple implementation for a sequence-to-sequence RNN using Keras:
seq2seq_model = tf.keras.Sequential([
tf.keras.layers.SimpleRNN(32, return_sequences=True, input_shape=[None, 5]),
tf.keras.layers.Dense(14)
])
This is supposed to take an arbitrary amount of time steps(in this case 52) with 5 features each and create a vector containing the next 2 weeks of predictions for each day, for each time step.
this is taken from the repo:
https://github.com/ageron/handson-ml3/blob/main/15_processing_sequences_using_rnns_and_cnns.ipynb
but it is not needed to understand this question.
So the time series data is passed along in sequence lengths of 56. So after calling predict we effectively get 56 prediction vectors each with 14 days of predictions (the first 51 used by training and the last one being the actual prediction). My question is: if there are 32 input neurons that each take a previous time step, how do we get 56 predictions if, for the first 31 time steps, there are not enough time steps before them to pass as inputs to all the neurons? are all the missing features set to 0? How is the forward pass completed in these cases? How is backpropagation on these cases useful when we are trying to train a model to predict using the full 52 time step data? I’m looking for Keras specific answers and also some general insight into this.
I tried running this code:
X = mulvar_valid.to_numpy()[np.newaxis, :seq_length]
y_pred_14 = seq2seq_model.predict(np.asarray(X).astype(np.float32))
print(len(y_pred_14[0]))
And found the predictions to have 56 vectors. I expected a 32-sized window rolling across the 56 time steps to produce around 20 prediction vectors.