I’m using a subclass of pandas’ DataFrame class. The subclass needs to have a property that is a list. Here’s an example:
import pandas as pd
class MyDataFrame(pd.DataFrame):
def __init__(self, data, colors, *args, **kwargs):
m = pd.DataFrame(data)
super().__init__(m, *args, **kwargs)
self.colors = colors
my_df = MyDataFrame(
{
"name": ["Fred", "Wilma"],
"age": [42, 38]
},
colors=["red", "yellow", "green"])
This gets me the following warning on self.colors = colors
:
UserWarning: Pandas doesn’t allow columns to be created via a new attribute name – see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
It appears that the problem is DataFrame’s feature of treating column headers as attributes and interpreting the “self.colors = colors
” line as a request to add a column to the DataFrame, which it very reasonable declines to do. I’ve tried added a setter without effect. I also tried moving the attribute assignment above the super().__init__
call, but ended up in an infinite recursion. What can I do to fix this?
5
You can add arbitrary data (effectively metadata) to a DataFrame
using its attrs property, but the documentation does come with this caveat:
attrs is experimental and may change without warning.
That said attrs
has been “experimental” for a few years now, and this should work:
import pandas as pd
class MyDataFrame(pd.DataFrame):
def __init__(self, data, colors, *args, **kwargs):
# as was mentioned, you don't really need the intermediate DataFrame 'm'
super().__init__(data, *args, **kwargs)
super().attrs['colors'] = colors
my_df = MyDataFrame(
{
"name": ["Fred", "Wilma"],
"age": [42, 38]
},
colors=["red", "yellow", "green"])
print(my_df)
print(my_df.attrs['colors'])
name age
0 Fred 42
1 Wilma 38
['red', 'yellow', 'green']
However, I think I agree with @roganjosh – you may be better served by not subclassing DataFrame
directly.