Relative Content

Tag Archive for machine-learningscikit-learndata-sciencerandom-forestxgboost

Final Predictions accuracy of my ML Binary Classification Model is horrible

So I am competing in a Kaggle competiton (https://www.kaggle.com/competitions/playground-series-s4e8) where we have to predict whether a mushroom is poisonous or not based on the data provided.
The issue I am facing is that my models perform well inside the training and validation sets just fine (around 98-99% accuracy) but they fall apart when I actually submit the final predictions for the competition.
The best accuracy I got until now using the Random forest model was 52% and the rest of my submissions had substantially worse performances. Since the models are performing well inside the notebooks and data with labels,
I assumed that the issue is with the way I am handling data in general because I did not implement techniques like feature engineering and I am not sure if the way I converted categorical data to numeric data works fine or not.
And as mentioned before, I am using the Random Forest Model and/or XGBoost model and these two models are quite well known to be a lot less prone to overfitting than other models.
I also ran multiple iterations of multiple models to find the models with the best parameters (as evident from the code below) so that makes the problem of overfitting less likely.