Below is the link for a dataset on focus. I want to split the dataset into training and test set, use training set to build the model and model tune, use test set to evaluate performance. But before doing that I want to make sure that original dataset doesn’t have noise, collinearity to address, no major outliers so that I have to transform the data using techniques like Box-Cox and looking at VIF to eliminate highly correlated predictors.
https://www.kaggle.com/datasets/joaofilipemarques/google-advanced-data-analytics-waze-user-data
When I fit the original dataset into regression model with Minitab, I get attached result for residuals. It doesn’t look normal. Does it mean there is high correlation or the dataset in have nonlinear response and predictors? How should I approach this? What would be my strategy if I use in Python, Minitab, and R. Explaining it in all softwares are appraciated if possible.
Residuals plot to check normality
bekzod ahmuratov is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.