Relative Content

Tag Archive for pythonmachine-learningdata-analysisrandom-forest

Is this model overfitting or is the quality of the data to bad?

I’m currently working on a machine learning project. It’s a supervised learning problem. My goal is to predict for given data of an animal(keeping,size,weight,…) ingredients(energy,vitamine etc..). First i have cleaned the data and encoded the categorial features with LabelEncoding. I choose Random Forest as algorithm, because i have read that trees are good for mixed data(categorial and continues). So i have trained the model with several parameters and i have noticed that i get excellent training results but very bad test results. In my opinion this indicates overfitting. The model is learning the noise. So and i know i have two options for that: More data and reducing the complexity of the model. But i have tried PCA, remove some features, changed the hyperparameter(max_depth to 15). But none of these actions helped. I have reduced the max_depth but then i got higher training error but still a massive high test error.