I have trained a random forest model and was checking feature importance of model. clf2.feature_importances_ will give me the importance of each feature and corresponding labels for each feature is obtained from columns values of provided training dataset X_train.columns
importances = clf2.feature_importances_
std = np.std([tree.feature_importances_ for tree in clf2.estimators_], axis=0)
forest_importances = pd.Series(importances, index=X_train.columns)
fig, ax = plt.subplots(figsize=(15,5))
forest_importances.sort_values(ascending=False).plot.bar(yerr=std, ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in impurity")
fig.tight_layout()
p = plt.xticks(rotation=45)
in this code i have sorted forest_importances series but did not sort std variable but still output is correct.
i know that pandas series are aligned as per the index before doing most of the operations but in this case there is no index for std array. how is this possible.
i tried another version of code which is also giving the same output
forest_importances = pd.DataFrame({"Feature Importance":clf2.feature_importances_,"Estimators std":np.std([tree.feature_importances_ for tree in clf2.estimators_], axis=0)},index=X_train.columns)
fig, ax = plt.subplots(figsize=(15,5))
forest_importances.sort_values(by="Feature Importance",ascending=False).plot.bar(ax=ax,yerr="Estimators std")
_ = ax.set_title("Feature importances using MDI")
_ = ax.set_ylabel("Mean decrease in impurity")
_ = fig.tight_layout()
_ = plt.xticks(rotation=45)