I have below code
df = pd.DataFrame(dict(
age=[5, 6, np.nan],
born=[pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')],
name=['Alfred', 'Batman', np.nan],
toy=[np.nan, 'Batmobile', 'Joker']))
Now I want to drop those rows where columns name
OR toy
have NaN/empty string values. I tried with below code
df[~df[['name', 'toy']].isna()]
I was expecting only 2nd row would return. Could you please help where I went wrong?
what is wrong
df[~df[['name', 'toy']].isna()]
returns a DataFrame (2D), thus you will apply a mask on the whole DataFrame, keeping values that are truthy (the shape remains unchanged).
You’re essentially doing:
df.where(~df[['name', 'toy']].isna())
Which will mask the non target columns with NaN and the NaN values from name/toy with NaN (= no change)
how to fix it
What you want to do is boolean indexing, you must pass a Series (and thus aggregate with any
). Also, since ''
is not considered a NaN, you first have to replace
them to None
:
out = df[~df[['name', 'toy']].replace('', None).isna().any(axis=1)]
Note that this is equivalent to using notna
+all
without the ~
:
out = df[df[['name', 'toy']].replace('', None).notna().all(axis=1)]
Output:
age born name toy
1 6.0 1939-05-27 Batman Batmobile
5