I am trying to use onehot encoding on a pandas dataframe column. The encoder generates 1582 features but when I proceed to merge these features to my original dataframe, I get the following error message:
ValueError: Shape of passed values is (54961, 1582), indices imply (54961, 4067)
below is the code (by the way is not my work but a compilation of others in this board put together):
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=True, handle_unknown='infrequent_if_exist')
encoded_data = encoder.fit_transform(df1.columnA.sort_values().values.reshape(-1, 1))
#to be used on unseen data
attribute_columnA = encoded_data.categories_
# change to array
encoded_data_array = encoded_data.toarray()
# merge to a dataframe
oh_df = pd.DataFrame(encoded_data_array, columns= encoded_data.get_feature_names_out())
Thank you for responding to this message.