I am trying to switch values between the Range
and Unit
columns in the dataframe below based on the condition that if Unit contains -
, then replace Unit
with Range
and Range
with Unit
. To do that, I am creating a unit_backup
column so that I don’t lose the original Unit
value.
1. dataframe
sample_df = pd.DataFrame({'Range':['34-67',12,'gm','45-90'],
'Unit':['ml','35-50','10-100','mg']})
sample_df
Range Unit
0 34-67 ml
1 12 35-50
2 gm 10-100
3 45-90 mg
2. Function
Below is the code I have tried but I am getting an error in this:
def range_unit_correction_fn(df):
# creating backup of Unit column
df['unit_backup'] = df['Unit']
# condition check
if df['unit_backup'].str.contains("-"):
# if condition is True then replace `Unit` value with `Range` and `Range` with `unit_backup`
df['Unit'] = df['Range']
df['Range'] = df['unit_backup']
else:
# if condition False then keep the same value
df['Range'] = df['Range']
# drop the backup column
df = df.drop(['unit_backup'],axis=1)
return df
- Applying the above function on the dataframe
sample_df = sample_df.apply(range_unit_correction_fn, axis=1)
sample_df
Error:
1061 def apply_standard(self):
1062 if self.engine == "python":
-> 1063 results, res_index = self.apply_series_generator()
...
----> 4 if df['unit_backup'].str.contains("-"):
5 df['Unit'] = df['Range']
6 df['Range'] = df['unit_backup']
AttributeError: 'str' object has no attribute 'str'
It seems like some silly mistake, but I am not sure where I am going wrong.
Appreciate any sort of help here.
4
Alternatively, in one line, with no need for a temporary column or column ‘backup’:
sample_df[['Range', 'Unit']] = sample_df[['Unit', 'Range']].where(sample_df['Unit'].str.contains('-'), sample_df[['Range', 'Unit']].values)
which gives:
Range Unit
0 34-67 ml
1 35-50 12
2 10-100 gm
3 45-90 mg
1
When you access df['unit_backup']
, you get a scalar string value, not a pandas Series, so calling .str
on it raises an error.
To fix it you can check the condition directly on the string value in a row-wise approach:
def range_unit_correction_fn(df):
# creating backup of Unit column
df['unit_backup'] = df['Unit']
# condition check
if '-' in df['unit_backup']:
# if condition is True then replace `Unit` value with `Range` and `Range` with `unit_backup`
df['Unit'] = df['Range']
df['Range'] = df['unit_backup']
# drop the backup column
df = df.drop(['unit_backup'])
return df
1