I would like to implement Onehot encoding and label encoding to my dataset using Pipeline into my random forest model. I have created a function that utilize pipeline from scikit learn together with OneHotEncoder and LabelEncoder.
def create_pipeline(self, train_feature, train_label, encoding_method, model):
if encoding_method == EncodingMethod.ONE_HOT:
categorical_cols = [col for col in train_feature.columns if train_feature[col].dtype == 'object']
categorical_transformer = Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore',
sparse=False))])
label_encoder = Pipeline(steps=[('label', LabelEncoder())])
preprocessor = ColumnTransformer(transformers=[('category', categorical_transformer, categorical_cols),
('label', label_encoder, [train_label.name])],
remainder='passthrough')
elif encoding_method == EncodingMethod.LABEL:
categorical_cols = [col for col in train_feature.columns if train_feature[col].dtype == 'object']
categorical_transformer = Pipeline(steps=[('label', LabelEncoder())])
preprocessor = ColumnTransformer(transformers=[('category', categorical_transformer, categorical_cols),
('label', categorical_transformer, [train_label.name])],
remainder='passthrough')
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', model)])
return pipeline
Using the function above, I would pass into my model script (using iris dataset, code as below) and expect the y_train (species column) will be encoded as 0,1,2 etc but when I print the output, it is still categorical values.
partial script:
df = self._dataset.as_dataframe()
train_feature = df[self._train_configs.feature_cols]
train_label = df[self._train_configs.target_col]
self._model = **self.create_pipeline(train_feature, train_label, self._train_configs.encoding_method, self._model)**
print("n")
print("This is model")
print(self._model)
X_train, X_test, y_train, y_test = train_test_split(train_feature, train_label, random_state=0, train_size=0.8)
print("n")
print("This is y_train")
print(y_train)
output of print y_train:
137 Iris-virginica
84 Iris-versicolor
27 Iris-setosa
127 Iris-virginica
132 Iris-virginica
…
9 Iris-setosa
103 Iris-virginica
67 Iris-versicolor
117 Iris-virginica
47 Iris-setosa
Stackie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.