I’m using the XGB regressor for modelization a temperature in a particular system. The results for train, val y test are good enough for passing to production (RMSE = 3.87C, R2 = 0.87, MAPE = 4.2%) with a band of residuals between +/-10C. The problem ocurr when I added new data for predictions, generating super accurate results. In the enclosed picture I mark in red (2014/04/15 up to date) the new data where it’s easily to see the accurate of the results meanwhile the rest (2014/01/01 to 2014/04/15) is part to the train/val/test data.
enter image description here
- Split of data: 75-15-15
- MinMaxScaler
- XGBRegressor(n_estimators=70, max_depth=10, eta=0.1, colsample_bytree=0.7)
Hopefully someone could bring me light
I’ve tried several splits, random shuffle, different hyperparameters… and nothing seems to work. I expect same residual prediction in the new data
Dani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.