Running a gradient boosted classifier in Python:
clf = GradientBoostingClassifier(n_estimators = 100, max_depth = 8, random_state=1000)
I’m confused why there is a random_state argument. I think I understand why it would be needed in a random forest ensemble for example, but doesn’t the ‘gradient’ part of this eliminate the randomness in the algorithm? What is random in the GDB algorithm?
I have 528 datapoints with 2 numerical features and 2 ordinal categorical features. Each iteration of the algorithm trains from a 6-fold cross validation on 80% of data chosen randomly and then the model is tested on remaining 20% percent. While changing the random_state doesn’t change my overall model accuracy, it changes my feature importances pretty drastically sometimes…can get around this by averaging the feature importances over many iterations of the model training, which leads to a consistent outcome regardless of random state, but still this doesn’t make sense to me. I’m still unsure what is random in the algorithm itself.
Isaiah Guenther is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.