I have dataframe like this:
df_challenge = pd.DataFrame({'x': [1, pd.NA, 6, 9, pd.NA, 0, 9, 10, 0, 9, pd.NA, 0],
'y': [0, 7.2, pd.NA, 10, 0, 1, 9.2, 10.65, pd.NA, 9, pd.NA, 0],
'y_copy': [0, 7.2, np.nan, 10, 0, 1, 9.2, 10.65, np.nan, 9, np.nan,0]})
df_challenge = df_challenge.convert_dtypes()
I forcefully changed the type of one of the columns
df_challenge.y_copy = df_challenge.y_copy.astype('float')
I now create two variables using below code:
df_challenge = df_challenge.assign(z = df_challenge.x/df_challenge.y)
df_challenge = df_challenge.assign(z1 = df_challenge.x.astype('float')/df_challenge.y_copy)
Now, If I try .isnull() method or .isna() method of series, it doesn’t show correct result for column z
The below code gives these results:
df_challenge.z.isna().sum() # 5 It should be 6
df_challenge.z.isnull().sum() # 5
df_challenge.z1.isna().sum() # 6 It is correct
df_challenge.z1.isnull().sum() # 6
My question is, why the .isnull() or .isna() isn’t performing correctly (or I am mistaken here) in these columns. The difference is that the data types involved in calculation are different, in z (division is happening on two (Int64/Float64, pandas newer datatypes), However in z1 calculation(division is happening on two floats)
Now, to circumvent the problem instead of using .isna, .isnull I have tried np.isfinite with pandas (not operator ~) and it correctly figures out NAN in z1
So, my second question is that whether this is a good idea to pull such NANs in pandas
Here is what worked
df_challenge.loc[~df_challenge.z.pipe(np.isfinite),:]
However, I am not satisfied with this workaround, although to me this works. But I wanted to understand this and thinking of a better solution.
Thanks
Pandas version: ‘2.2.2’
Python version: Python 3.10.14