I have time series data that I have to fit a classifier, I would like to re-train it every month with new data coming in. I would like to keep some consistency, so I prefer to pre_warm the tree with the previous boosting trees and refit the data, or just iterate few times to get the results I need. In other words, I want a similar tree structure more or less.
Seems like the refit function is the best way to do it but unfortunately It doesnt seem to be an option for scikit API-only boosters. What is the best way I can approach this? I have done the following thus far:
params={'objective': 'multiclassova',
'num_class': 6,
'boosting_type': 'gbdt',
'reg_alpha': 0.0,
'reg_lambda': 0.0,
'num_leaves': 103,
'feature_fraction': 0.9311573062675359,
'bagging_fraction': 0.9568372729883741,
'bagging_freq': 1,
'min_child_samples': 54,
'learning_rate': 0.056834238901176865,
'max_depth': 99,
'min_data_in_leaf': 2,
'min_gain_to_split': 2.272224629201629,
'drop_rate': 0.6143619198733702,
'n_estimators': 232,
'force_col_wise': True,
'class_weight': {1: 1.2466666666666666,
5: 0.29088888888888886,
3: 0.3177184466019417,
0: 0.6233333333333333,
4: 1.3357142857142859,
2: 3.85},
'seed': 42,
'random_state': 42,
'verbose': -1,
'eval_metric': 'precision'}
classifier_obj = lgb.LGBMClassifier(**params)
classifier_obj.fit(X_train, y_train, categorical_feature=categorical_data)
booster = lgb.train(params={}, train_set=data_set['train_data'], categorical_feature=categorical_data, init_model=classifier_obj, keep_training_booster=True, num_boost_round=1)
booster.refit(data=X_train, label=y_train)
I’m sure this is wrong by just looking at the outputs can anyone point to the right direction? Thanks