I want to know the feature importance to my data, so I use permutation_importance. When I get the result, it seems the feature already decoded, and I want to know the name of my feauture using get_features_name_out
. It turns an error 'StandardScaler' object has no attribute 'get_feature_names_out'
. If I tried to interprest manually, I am afraid there is a mistake in order. It should be (3,0,1,2) in order. Smoker, age, bmi, sex .
Here is the code
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.inspection import permutation_importance
# Prepare data
X = df[['age', 'bmi', 'sex', 'smoker']]
y = df['charges']
# Define the preprocessor
categorical_transformer = OneHotEncoder(drop='first', sparse=False)
numerical_transformer = StandardScaler()
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, ['age', 'bmi']),
('cat', categorical_transformer, ['sex', 'smoker'])
]
)
# Preprocess the data
X_preprocessed = preprocessor.fit_transform(X)
# Extract feature names
num_features = numerical_transformer.get_feature_names_out(['age', 'bmi'])
cat_features = categorical_transformer.get_feature_names_out(['sex', 'smoker'])
feature_names = np.concatenate([num_features, cat_features])
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_preprocessed, y, test_size=0.2, random_state=42)
# Train KNeighborsRegressor
knn_regressor = KNeighborsRegressor()
reg_model = knn_regressor.fit(X_train, y_train)
# Evaluate feature importance using permutation importance
results = permutation_importance(knn_regressor, X_test, y_test, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')
# Display feature importances with names
for i, importance in enumerate(results.importances_mean):
print(f"Feature '{feature_names[i]}': Importance: {importance}")
sorted_indices = np.argsort(results.importances_mean)
for i in sorted_indices[::-1]:
print(f"Feature '{feature_names[i]}', Importance: {results.importances_mean[i]}")
I want to know the names of feature back.