What are the reasons behind pandas allowing operations between sets / strings / other non-float types and NaN (yielding NaN), whereas pure Python does not?
import pandas as pd
import numpy as np
pd.Series([np.nan]) - pd.Series([{5}]) # yields a NaN-series
pd.Series([np.nan]) - set([5]) # throws error "TypeError: unsupported operand type(s) for -: 'float' and 'set'"
1
NaNs have a specific meaning in a pandas Series. They should remain NaNs on most operations. Series operation thus provide this as a convenience for some types.
For example with strings:
pd.Series([np.nan, 'a', 'b']) + 'c'
0 NaN
1 ac
2 bc
dtype: object
This is most likely handled since string operations are quite common. Set operation are just not handled from scratch (they are in absence of NaNs: pd.Series([{4, 5}, {5}]) - set([5])
), maybe because they are iterables which is a particular type of data and always tricky to build a Series. Anyway, you should keep in mind that string/set/python operations are not really performed vectorially by pandas.
One possibility would be to drop the NaNs, perform the set operation, then restore the NaNs:
s = pd.Series([np.nan, {4, 5}, {5}])
s.dropna().sub({5}).reindex_like(s)
0 NaN
1 {4}
2 {}
dtype: object
Alternatively, with a custom function:
s.map(lambda x: x if pd.isna(x) else x-{5})
0 NaN
1 {4}
2 {}
dtype: object
2