I tried to transfer an XGBClassifier model trained in old python environment to a new environment.
Below are the version info for key packages in old and new environments.
Old environment
- python=3.6.0
- scikit-learn==0.22.2.post1
- xgboost==0.90
- pickleshare==0.7.5
- numpy==1.18.1
New environment
- python=3.11.9
- scikit-learn==1.4.2
- xgboost==2.0.3
- pickleshare==0.7.5
- numpy==1.26.4
When training the models in the old and new environments separately with the same set of hyper-parameters and the same data, the predicted probabilities are noticeably different.
I also noticed that the Size of the fitted pipeline object as wells as the time it took to train the model has significantly changed.
Size of the fitted pipeline object old vs. new: 30 MB vs. 7 MB
training time old vs. new: 4:38:46 vs. 0:06:40
Any thoughts on the difference I observed between models trained in old and new environments.
Thank you in advance! I would really appreciate the help!
Below is the key python code I used to train the model.
def create_pipeline(model_params, cat_indices):
"""
Create a pipeline
:param model_params: model parameters for the XGBoost Classifier in the pipeline
:param cat_indices: indices for the categorical features in X
"""
cat_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('one_hot_encoder', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[('cat', cat_transformer, cat_indices)],
remainder='passthrough')
xgb = XGBClassifier(objective="binary:logistic", eval_metric="auc", missing=np.nan, use_label_encoder=False)
xgb.set_params(**model_params)
full_pipeline_model = Pipeline(steps=[('preprocessor', preprocessor),
('model', xgb)])
return full_pipeline_model
model_params = {
'n_estimators': 500,
'alpha': 9.73974803929248e-06,
'gamma': 19,
'lambda': 0.557185777864069,
'learning_rate': 0.029438952461179668,
'max_depth': 13,
'scale_pos_weight': 5,
'subsample': 0.687206238714661
}
cat_indices = [X.columns.get_loc(col) for col in cat_cols]
fitted_pipeline = create_pipeline(model_params, cat_indices).fit(X.values, y.values)
pickle.dump(fitted_pipeline, open("fitted_pipeline_final1.pkl", "wb"))
I expect the predicted probabilities obtained from both models to be very similar since I used the same set of hyper-parameters and the same data. What could be the reasons that the predicted probabilities are noticeably different?
Feng Zhao is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.