I want to create a pipeline to do preprocessing in both training features and target, then train the model. Dataset would be something like:
v1 v2 target
0 1 a yes
1 5 c no
2 3 f yes
I have a pipeline like
num_cols = ['v1']
cat_cols = ['v2']
clf = DecisionTreeClassifier()
num_transformer = Pipeline(steps=[
('impute', SimpleImputer(strategy='mean')),
('scale', MinMaxScaler())
])
cat_transformer = Pipeline(steps=[('impute', SimpleImputer(strategy='most_frequent'))])
col_trans = ColumnTransformer(transformers=[
('num_pipeline', num_transformer, num_cols),
('cat_pipeline', cat_transformer, cat_cols))
clf_pipeline = Pipeline(steps=[
('col_trans', col_trans),
('model', clf)
])
The idea is to Label Encode the target so it will be
target
1
0
1
If possible it would be interesting to also decode the predictions