My question is how N_u
units of LSTM
works on a data of N_x
length? I know that there are many similar questions asked before but the answers are full of contradictions and confusions. Therefore I am trying to clear my doubts by asking specific questions. I am following the simple blog here:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Q0) Is keras
implementation consistent with the above blog?
Please consider the following code.
import tensorflow as tf
N_u,N_x=1,1
model = tf.keras.Sequential([
tf.keras.layers.LSTM(N_u, stateful=True, batch_input_shape=(32, 1, N_x))
])
model.summary()
For simplicity, my input data here is just a scalar and I have one time step. The output shape is (32,1)
. No. of parameter is 12.
Q1) I have one LSTM
unit or cell, right? The following represent a cell, right?
I understand from the picture that there would be 12 parameters : forget gate=2 weights+1 bias; input_gate=2*(2 weights+1 bias); output gate=(2 weights+1 bias). So everything is fine up to this point.
Q2) Now let us set N_u,N_x=1,2
. I expect the same cell will be applied to the two elements of x
. But I found that the total number of parameters now is 16! Why? Is it because I get 4 additional weight parameters corresponding to the LSTM
connection between the x_2
and the LSTM
unit?
Q3) Now let us set N_u,N_x=2,1
. I have now two units of LSTM
. Are these two units completely independent or do they influence each other? I expected the parameters number would be 2*12=24, but I in reality got 32 instead. Why 32?
Q4) If I set N_u,N_x=2,2
, number of parameter is 40. I think I can get it if I understand the above two points.
Thank you in advance.