I have an existing LightGBM model. I’m trying to continue training on. It has 9 features. I’m trying to add a 10th. I don’t have the source materials for the original file, only the 9Models.txt
file that is used for the booster.
data = np.array(data)
label = np.array(label)
params = {}
train_data = lgb.Dataset(data, label=label, free_raw_data=False)
gbm = lgb.train(params, train_data, keep_training_booster=True, num_boost_round = 10, init_model='9Models.txt')
When I try to run this, I get:
lightgbm.basic.LightGBMError: The number of features in data (1) is not the same as it was in training data (11).
I’m at a loss for what to do next here.
I tried changing parameters, I tried setting:
predict_disable_shape_check=true
but that leads to this error:
lightgbm.basic.LightGBMError: Number of class for initial score error
Continued training with LightGBM is for adding more trees to an existing model… you cannot use it to add new features.
The number and order of features in the data presented for training continuation must be the same as the number and order of features presented in training the original model.
If you want to try to take information from a prior model and to then train a new model on a different set of features, you could try using the predictions from the original model as the initial score to boost from.
Here’s an example of doing that in Python (which appears to be how you’re using LightGBM):
import lightgbm as lgb
from sklearn.datasets import make_regression
# full raw data: 10,000 rows, 11 features
X, y = make_regression(n_samples=10_000, n_features=11, n_informative=11)
# split into 2 sets:
# first 7,500 rows, first 10 features
X_0 = X[:7500, :10]
y_0 = y[:7500]
# final 2,500 rows, all 11 features
X_1 = X[7500:,]
y_1 = y[7500:]
# train a model on the (7500, 10) Dataset
bst_0 = lgb.train(
params={
"objective": "regression",
"num_leaves": 11
},
train_set=lgb.Dataset(data=X_0, label=y_0),
num_boost_round=7
)
# get raw predictions on the (2500, 11) dataset
preds_from_bst_0 = bst_0.predict(X_1[:,:10], raw_score=True)
# train for another 5 boosting rounds on the (2500, 11) Dataset,
# with boosting starting from the predictions of the first model
bst_1 = lgb.train(
params={
"objective": "regression",
"num_leaves": 11
},
train_set=lgb.Dataset(
data=X_1,
label=y_1,
init_score=preds_from_bst_0
),
num_boost_round=5
)
preds_from_bst_1 = bst_1.predict(X_1, raw_score=True)