I have trained an XGboost model in python and have list of probabilities as an output. How can I bring these probabilities to the original dataset so that I have data + predicted values in one DF? Let’s say my original raw test df is called df_raw.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
model = XGBClassifier(n_estimators=1500, max_depth=5, n_jobs=-1, min_child_weight=2,
early_stopping_rounds=25)
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
test_outputs = model.predict_proba(X_test)
Before returning probabilities of a model, you have to predict them first.
Given your code fragment, you’re doing a classification task – this way predict_proba()
returns probability of each class (2 if it’s a binary classification) – so I added [:, 1]
, this means all the rows and second column, which indicates probability of a given class.
# this returns list of features used in a model
features = model.get_booster().feature_names
# return probability of positive class
df_raw['predictions'] = model.predict_proba(df_raw[features])[:, 1]
0