I have a dataset of two inputs and an output. I want to use gpflow and linear kernel to have a straight line for prediction. I have trained the model but the model does not show a straight line. Even considering a small variance for the linear kernel does not result in linear behavior. I am pretty new working with the method, and I am not sure why this happens.
Here is an example of the data:
data = {"input1": [1384, 1366, 1349, 1331, 1313, 1296, 1190, 1065, 852, 6364],
"input2": [215.84, 215.89, 216.13, 216.47, 216.82, 217.07, 217.10, 217.01, 216.84, 216.67],
"output": [149.46, 149.51, 149.47, 149.34, 149.14, 148.93, 148.80, 148.71, 148.70, 148.77]}
df = pd.DataFrame(data)
and this is the code:
import gpflow
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
X = df[['input1', 'input2']]
Y = df[['output']]
scaler_X = StandardScaler()
scaler_Y = StandardScaler()
X_scaled = scaler_X.fit_transform(X)
Y_scaled = scaler_Y.fit_transform(Y)
kernel = gpflow.kernels.Linear()
model = gpflow.models.GPR(data=(X_scaled, Y_scaled), kernel=kernel)
#model.likelihood.variance = gpflow.Parameter(1e-6, trainable=False, transform=gpflow.utilities.positive())
gpflow.utilities.print_summary(model)
opt = gpflow.optimizers.Scipy()
opt.minimize(model.training_loss, variables=model.trainable_variables, options={'maxiter': 100})
#Predict on the training data
Y_mean, Y_var = model.predict_f(X_scaled)
Y_mean = Y_mean.numpy()
Y_var = Y_var.numpy()
x_index = np.arange(len(Y))
plt.figure(figsize=(12, 6))
plt.plot(x_index, Y_scaled, 'kx', mew=2, label='Actual data')
plt.plot(x_index, Y_mean, 'b', lw=2, label='Mean prediction')
plt.fill_between(x_index,
(Y_mean - 1.96 * np.sqrt(Y_var)).flatten(),
(Y_mean + 1.96 * np.sqrt(Y_var)).flatten(),
color='blue', alpha=0.2, label='95% confidence interval')
plt.legend()
plt.xlabel('Row Number')
plt.ylabel('output')
plt.title('Gaussian Process Regression with Linear Kernel')
plt.show()