I want to train an XGBoost model that takes in 2 (or more) inputs and produces 2 output values. These are to be interpreted as parameters of a probability distribution for the outcome, for example, the mean and log-variance of a Normal distribution. To train XGBoost to output these parameters I want to implement the negative log likelihood (here that of the Normal distribution) as a custom objective function.
Here is what I have so far:
import numpy as np
import scipy as sci
import XGBoost as xgb # xgb.__version__ is 2.0.3
# features
X = sci.stats.norm.rvs(size=(32,2))
# targets (samples from normal distribution with parameters dependent on X)
z = sci.stats.norm.rvs(loc = X[:,0], scale = np.exp(X[:,1]), size=32)
# add a column of zeros to targets to tell XGBoost to return 2 outputs
y = np.stack([z, np.zeros_like(z)], axis=1)
# combine features and targets in a xgb.DMatrix object
dtrain = xgb.DMatrix(X, label=y)
One issue I had here is that dtrain.get_labels()
now returns a 1d array of length 2*n (with every second value being zero) rather than the (n,2) array I put in. Given that DMatrix
seems to flatten the labels array, maybe it is not intended to be used with multiple labels — Any advise on this? Anyway, moving on:
Following the docs I implement gradient and hessian of the negative normal log likelihood and use them in xgb.train()
as follows:
from typing import Tuple
# custom gradient of the neg. normal loglik with respect to input features
def gradient(pred: np.ndarray, train: xgb.DMatrix) -> np.ndarray:
# remove zeros from targets array
y0 = train.get_label().reshape(-1,2)[:,0]
yh1 = pred[:,0]
yh2 = pred[:,1]
gg = np.array([-2 * np.exp(-yh2) * (y0 - yh1), 1 - 0.5*np.exp(-yh2)*(y0-yh1)**2]).T
print('gradient:', gg, 'nn')
return(gg)
# custom hessian of the neg. normal loglik with respect to input features
def hessian(pred: np.ndarray, train: xgb.DMatrix) -> np.ndarray:
y0 = train.get_label().reshape(-1,2)[:,0]
yh1 = pred[:,0]
yh2 = pred[:,1]
hh = np.array([[2*np.exp(-yh2), 2*np.exp(-yh2)*(y0-yh1)],
[2*np.exp(-yh2)*(y0-yh1), 0.5*np.exp(-yh2)*(y0-yh1)**2]]).T
print('hessian:', hh, 'nn')
return(hh)
def nll(pred: np.ndarray, train: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:
grad = gradient(pred, train)
hess = hessian(pred, train)
return grad, hess
np.random.seed(123)
mod = xgb.train({'tree_method': 'hist', 'seed': 5},
dtrain=dtrain,
num_boost_round=10,
obj=nll)
If I rerun the last few lines, I always get different results despite fixing the seed. Most of the time I end up getting nan
s for gradient and Hessian.
- Can all this even work, or is XGBoost not the right tool?
- Am I using
xgb.DMatrix
correctly? - How do I make XGBoost reproducible?
- How can I go about understanding and avoiding the
nan
s in my gradients and hessians?