I’m trying to create a new variable in a dataframe that requires some complex conditional logic. I’d like the default value to be np.nan
if all of the conditions fail.
my approach has been like this:
startingdf['name_of_new_variable'] = pd.Series(pd.nan, index = startingdf.index).case_when(
caselist=[
(...,...),
(...,...),
(...,...),
#etc etc
]
)
The key thing I’m hoping for here is that case_when
will be called on a series populated with pd.nan
(hence pd.Series(pd.nan...)
). That way there will be an automatic pd.nan
in any row in the name_of_new_variable
column where none of the conditions in the caselist
are met.
But there’s something wrong with pd.Series(pd.nan, index=startingdf)
. When I write…
pd.Series(pd.nan, index = startingdf.index).case_when()
my typechecker (happens to be Pylance) complains that “object of type float is not callable.”
So somehow, pd.Series(pd.nan, index=startingdf.index).case_when
is returning a float instead of the (callable!) method case_when
of a Series instance!
What am I doing wrong?