First of all, I ran into a problem with r squared values coming out weird.
I’d appreciate it if you could look at the code first.
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3,shuffle = True, random_state=42)
parameters = {
'max_depth': Integer(3, 30),
'n_estimators': Integer(100, 1000),
'learning_rate': Real(0.01, 0.5),
'colsample_bytree': Real(0.1, 1.0),
'subsample': Real(0.1, 1.0),
'min_child_weight': Integer(1, 10),
}
xgb_model = XGBRegressor()
cv = KFold(n_splits=10, shuffle=True, random_state=42)
bayes_search = BayesSearchCV(xgb_model, parameters, cv=cv, n_jobs=-1,scoring='r2',verbose=5,random_state=42)
bayes_search.fit(X_train, y_train)
best_params = bayes_search.best_params_
print("Best Hyperparameters:", best_params)
# CV R^2 score
cv_results = bayes_search.cv_results_
cv_mean_score = cv_results['mean_test_score'][bayes_search.best_index_]
print("CV R squared Score:", cv_mean_score)
# Train R^2
final_model = XGBRegressor(**best_params)
final_model.fit(X_train, y_train)
y_pred1 = final_model.predict(X_train)
train_r2 = r2_score(y_train, y_pred1)
print("Train R squared:", train_r2)
# Test R^2
y_pred = final_model.predict(X_test)
test_r2 = r2_score(y_test, y_pred)
print("Test R squared:", test_r2)
When I ran r squared using these codes, I got the following results.
CV R squared Score: 0.6303, Train R squared: 0.7647, Test R squared: 0.6337
Here is the question:
-
I tried different random states, which resulted in train r^2 > test r^2 > cv r^2. My common sense tells me that train r^2 and cv r^2 should have the same value because they used the same parameters, and they should always be higher than test r^2 which is made with 30% on the whole dataset, so I don’t know why I got this set of results r^2 > test r^2 > cv r^2.
-
I tried bayes search & grid search, xgboost & randomforest, and many other methods, but I couldn’t get r squared above 0.8. Should I use an ensemble of different models to increase it? I would appreciate any suggestions.
If I’ve misunderstood a concept or made a mistake in the code, please let me know.
It’s a long post, really thanks for reading.
박기문 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.