I am trying to use a unsupervised LSTM to detect anomaly in the data. The idea is if the reconstruction differs too much, it should categorise it as anomalous. My dataset is [‘Day’, ‘Result]. The train dataset would have all non-anomalous data and the test dataset would have some anomalous data points.
trainX shape: (46779, 50) | trainY shape: (46779,)
testX shape: (12447, 50) | testY shape: (12447,)
Scaling:
seq_size = 50
def to_sequences_per_day(df, seq_size=1):
x_values = []
y_values = []
for day, group in df.groupby('Day'):
day_results = group['Result'].values
for i in range(len(day_results) - seq_size):
x_values.append(day_results[i:(i+seq_size)])
y_values.append(day_results[i+seq_size])
return np.array(x_values), np.array(y_values)
Model:
# Define the model
model = Sequential()
# Encoder
model.add(LSTM(64, input_shape=(seq_size, 1), return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(16, return_sequences=False))
# Decoder
model.add(Dense(32, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile the model
optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='mae')
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history = model.fit(trainX, trainX, epochs=100, batch_size=32, validation_split=0.1, callbacks=[early_stopping], verbose=1)
Sample dataset (not the same that i am actually using):
data = pd.DataFrame({
'Day': [1, 1, 2, 3, 3, 3,...],
'Result': [150, 158, 114, 120, 160, 125,...]
})
# This result will later be normalised to range from 0 to 1
Could someone suggest what might be the problem? The val loss is really low and always reach a plateau. The goal is to learn the Result pattern and reconstruct it.
3