I have a pandas column with some strings in. I want to group up the strings that are similar, and replace with their category. In my real example, I have 6 different strings, and I wanted to replace then with 3 different strings for their categories.
I found this answer for how to map many values to 1 using in the replace() function, so I tried expanding some of the answers to do a many to 1 mapping for multiple groups, however not all of my values were correctly changed and I’m not sure why.
As an example:
df1 = pd.DataFrame({'col1':['foo', 'foo too', 'bar', 'BAR', 'bar ii']})
col1
0 foo
1 foo too
2 bar
3 BAR
4 bar ii
From one of the answers, it looked like you could use ‘|’ to separate different key options if you used regex, so I did this like below:
df1['col1'].replace({'foo|foo too' : 'Foo',
'bar|BAR|bar ii' : 'Bar'}, regex=True)
Which converted most of my strings, but not all:
col1
0 Foo
1 Foo too
2 Bar
3 Bar
4 Bar ii
From this example, I would guess something to do with the spaces? although in my actual example some of my strings with spaces did get correctly replaced, so I’m not sure. Any help with why this doesn’t work/how I could achieve what I’m after would be appreciated