I want to use binary classification for a time series prediction: if a signal coming from a sensor changes from one shape to another, be able to detect that the system has degraded. I have two possible outputs/labels: 0 or 1. I label it with 0 when the full signal looks like this (the signal is fine):
And I label it with 1 when the signal is compressed like this (the signal is degraded):
The point in this case is that as it can be checked in the figures, the length sequence is reduced among the different signals: the first signal has around 786 time points, while the last one has only 64 (the length will decrease among each sequence). I have been searching how to implement this and I have used the second answer from this thread.
Regarding the inputs:
- The input x is named x_list, it is a list of 10 numpy arrays, where each array is a numpy array of variable length (from 47 to 786), formed by float values between 0.5 and 1.6. It looks like this:
- The y is named y_list, it is also a list of numpy arrays, each array only with a 0 or a 1, so it is pretty straight forward and looks like this:
This is my full code (except the part coming from reading the values from the file):
num_sequences = len(x_list)
num_features = len(x_list[0][0])
batch_size = 5
batches_per_epoch = 2
def train_generator():
# Sort by length so the number of timesteps in each batch is minimized
x_list.sort(key=len)
y_list.sort(key=len)
# Generate batches
while True:
for b in range(batches_per_epoch):
longest_index = (b + 1) * batch_size - 1
timesteps = len(x_list[longest_index])
x_train = np.zeros((batch_size, timesteps, num_features))
y_train = np.zeros((batch_size, timesteps, 1))
for i in range(batch_size):
li = b * batch_size + i
x_train[i, 0:len(x_list[li]), :] = x_list[li]
y_train[i, 0:len(y_list[li]), 0] = y_list[li]
yield x_train, y_train
model = tf.keras.models.Sequential([
tf.keras.layers.Masking(mask_value=0., input_shape=(None,num_features)),
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(2, activation=tf.nn.softmax) ])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_generator(), steps_per_epoch=batches_per_epoch, epochs=100)
ypredict1 = model.predict(x_list[0])
ypredict2 = model.predict(x_list[5])
Although the model is able to train, I don’t understand the predictions I get and their dimensions. If I predict an input that I have labeled with a zero, for example the first one of the input used to train (x_list[0]), I get the following:
What are exactly this values? If I do the same with x_list7, which I labeled with a one, I get a similar result:
The values are very similar, which I don’t understand, as I would expect the second one to be closer to zero.
So my questions are:
- Is this a coherent way to approach the problem?
- Am I feeding the NN correctly for my problem?
- What are returning my predictions? do they make sense?
Any suggestion to improve the code is appreciated.