I am trying to first impute a categorical feature(embark from titanic dataset) using SimpleImputer and then do OHE encoding on it. The problem is, the imputer is creating a column of the original values that are in string. I tried setting False to copy parameter but it’s still creating a column of the original values. If I try to use this in pipeline to apply logistic regression, the string values breaks the code. How can avoid making the copy of the original values?
numerical_transformer = ColumnTransformer([
#impute missing values
('impute_age',SimpleImputer(copy=False),[0]),
#scale the age and fare column
('scale_age',StandardScaler(),[0]),
('scale_fare',StandardScaler(),[1])
],remainder='passthrough')
categorical_transformer = ColumnTransformer([
#impute missing values
('impute_embark',SimpleImputer(strategy='most_frequent',copy=False),[3]),
#encode embarked column
('ohe_embarked',OneHotEncoder(sparse_output=False,handle_unknown='ignore',drop='first'),[3]),
],remainder='passthrough')
log_reg_transformer = LogisticRegression()
pipe = make_pipeline(numerical_transformer,categorical_transformer,log_reg_transformer)
pipe.fit(X_train,y_train)
The error I am getting is:
ValueError: could not convert string to float: ‘S’
Thanks in advance.
I tried setting copy parameter to false so that SimpleImputer doesn’t create the copy of original value