Thiết kế website giá rẻ

Question

I want to train an XGBoost model that takes in 2 (or more) inputs and produces 2 output values. These are to be interpreted as parameters of a probability distribution for the outcome, for example, the mean and log-variance of a Normal distribution. To train XGBoost to output these parameters I want to implement the negative log likelihood (here that of the Normal distribution) as a custom objective function.

Here is what I have so far:

import numpy as np
import scipy as sci
import XGBoost as xgb # xgb.__version__ is 2.0.3

# features
X = sci.stats.norm.rvs(size=(32,2))     
                                                                                                                                                                           
# targets (samples from normal distribution with parameters dependent on X)
z = sci.stats.norm.rvs(loc = X[:,0], scale = np.exp(X[:,1]), size=32)

# add a column of zeros to targets to tell XGBoost to return 2 outputs                                                                                                                                            
y = np.stack([z, np.zeros_like(z)], axis=1) 

# combine features and targets in a xgb.DMatrix object
dtrain = xgb.DMatrix(X, label=y)

One issue I had here is that dtrain.get_labels() now returns a 1d array of length 2*n (with every second value being zero) rather than the (n,2) array I put in. Given that DMatrix seems to flatten the labels array, maybe it is not intended to be used with multiple labels — Any advise on this? Anyway, moving on:

Following the docs I implement gradient and hessian of the negative normal log likelihood and use them in xgb.train() as follows:

from typing import Tuple
   
# custom gradient of the neg. normal loglik with respect to input features                                                                                                                                                                                        
def gradient(pred: np.ndarray, train: xgb.DMatrix) -> np.ndarray:                                                                                                                                                  
    # remove zeros from targets array
    y0 = train.get_label().reshape(-1,2)[:,0]                                                                                                                                                                      
    yh1 = pred[:,0]                                                                                                                                                                                                
    yh2 = pred[:,1]                                                                                                                                                                                                
    gg = np.array([-2 * np.exp(-yh2) * (y0 - yh1), 1 - 0.5*np.exp(-yh2)*(y0-yh1)**2]).T                                                                                                                            
    print('gradient:', gg, 'nn')                                                                                                                                                                                 
    return(gg)                                                                                                                                                                                                     
          
# custom hessian of the neg. normal loglik with respect to input features                                                                                                                                                                                                         
def hessian(pred: np.ndarray, train: xgb.DMatrix) -> np.ndarray:                                                                                                                                                   
    y0 = train.get_label().reshape(-1,2)[:,0]                                                                                                                                                                      
    yh1 = pred[:,0]                                                                                                                                                                                                
    yh2 = pred[:,1]                                                                                                                                                                                                
    hh = np.array([[2*np.exp(-yh2),          2*np.exp(-yh2)*(y0-yh1)],                                                                                                                                                    
                   [2*np.exp(-yh2)*(y0-yh1), 0.5*np.exp(-yh2)*(y0-yh1)**2]]).T                                                                                                                                  
    print('hessian:', hh, 'nn')                                                                                                                                                                                                                                                                                                                                                                                        
    return(hh)                                                                                                                                                                                                     
                                                                                                                                                                                                                   
def nll(pred: np.ndarray, train: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:                                                                                                                                    
    grad = gradient(pred, train)                                                                                                                                                                                   
    hess = hessian(pred, train)                                                                                                                                                                                    
    return grad, hess                                                                                                                                                                                              
                                                                                                                                                                                                                  
np.random.seed(123)                                                                                                                                                                                                
mod = xgb.train({'tree_method': 'hist', 'seed': 5},                                                                                                                                                                
                dtrain=dtrain,                                                                                                                                                                                     
                num_boost_round=10,                                                                                                                                                                                
                obj=nll)

If I rerun the last few lines, I always get different results despite fixing the seed. Most of the time I end up getting nans for gradient and Hessian.

Can all this even work, or is XGBoost not the right tool?
Am I using xgb.DMatrix correctly?
How do I make XGBoost reproducible?
How can I go about understanding and avoiding the nans in my gradients and hessians?

Thiết kế website giá rẻ

Danh mục

Custom loss function for distribution fitting with XGBoost