i’m building an LSTM model to forecast future total number of cases for covid 19 using OWID dataset
the Problem is i get all zero predictions, this not the case when i use univariate series using only single column the total_cases column
here is the code i use
url = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
df = pd.read_csv(url)
# Filter the data for a specific location, e.g., 'United States'
location = 'United States'
df_location = df[df['location'] == location]
# Select relevant columns and set the date column as the index
selected_columns = ['date', 'new_cases', 'new_deaths', 'total_cases', 'total_deaths', 'reproduction_rate']
df_location = df_location[selected_columns]
df_location['date'] = pd.to_datetime(df_location['date'])
df_location.set_index('date', inplace=True)
# Handle missing values by filling them with the mean of the column
df_location.fillna(df_location.mean(), inplace=True)
# Function to create a multivariate dataset for LSTM
def create_multivariate_dataset(data, time_step=1):
X, Y = [], []
for i in range(len(data) - time_step - 1):
X.append(data[i:(i + time_step)])
Y.append(data[i + time_step, 0]) # Predicting 'new_cases'
return np.array(X), np.array(Y)
# Convert the data to numpy array and scale it
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df_location)
# Create the dataset with a specified time step, e.g., 60 days
time_step = 60
X, y = create_multivariate_dataset(scaled_data, time_step)
# Reshape the input to be [samples, time steps, features] for LSTM
X = X.reshape(X.shape[0], X.shape[1], X.shape[2])
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the LSTM model
model = Sequential()
model.add(LSTM(100, return_sequences=True, input_shape=(time_step, X.shape[2]))) # Adjusted input_shape
model.add(LSTM(100, return_sequences=False))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='relu'))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32, verbose=1)
# Make predictions on the test set
test_predict = model.predict(X_test)
# Inverse transform the predictions to get actual values
test_predict_full = np.concatenate((test_predict, np.zeros((test_predict.shape[0], scaled_data.shape[1] - 1))), axis=1)
test_predict = scaler.inverse_transform(test_predict_full)[:,0]
output of test predict is always all zeros
what i’m doing wrong