I have the following function:
from numpy import typing as npt
def prediction_interval(X:npt.NDArray):
y_pred, y_pis = mapie_regressor.predict(X, alpha=0.05)
y_lower, y_upper = y_pis[:, 0, 0], y_pis[:, 1, 0]
return y_pred, y_lower, y_upper
which receives a 2D array X
of shape (n_samples, n_features)
as input, and returns 3 1D arrays of length n_samples
. For brevity I’m not including the code to fit a MAPIE regressor (you can find an example here: https://mapie.readthedocs.io/en/latest/generated/mapie.regression.MapieRegressor.html). You just need to know that it has the same interface as a scikit-learn
univariate regressor, but it returns three 1D arrays instead of one, i.e., univariate regression with prediction intervals.
I want to extend this to multivariate regression with prediction intervals. In other words, now I have a list mapie_regressors
of length n_outputs
containingmapie_regressor
, and I want to return 3 2D arrays y_pred, y_lower, y_upper
of shape (n_samples, n_outputs)
. These arrays store, for each sample, the prediction and prediction intervals of all the n_outputs
regression models. I tried the following:
import numpy as np
from joblib import Parallel, delayed
n_jobs=-1
def prediction_intervals(mapie_regressors:list, X:npt.NDarray):
tmp = Parallel(n_jobs=n_jobs)(
delayed(model.predict_interval)(X) for model in mapie_regressors
)
y_pred = np.array([x[0] for x in tmp]).T
y_lower = np.array([x[1] for x in tmp]).T
y_upper = np.array([x[2] for x in tmp]).T
return y_pred, y_lower, y_upper
This works, but the three list comprehensions are not exactly fast. Is there a way to make this code faster?