I had a label column from a pandas dataframe that had so many variance. I want to narrow it down by putting some of the label to another label I chose.
The data is supposed to be like this (both column are in dtype string):
old_label | new_label |
---|---|
health | health |
healthy_tips | health |
rejuvenation | health |
government | government |
senate | government |
governor | government |
So I apply this function that inspect every substring element of the inputs:
def relabel(x):
for i in x:
if ("health" or "rejuvenation") in i:
return "health"
elif ("gover" or "senate") in i:
return "government"
else:
return i
Then I apply using:
data['new_label'] = data['old_label'].apply(relabel)
But it immediately return the exact same value by its input, so the result is just a new column with the exact same data.
How to fix this?
user15587046 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.