Suppose I have the following classes, my questions are
- is the size of
tmp2
much larger thantmp1
, as it needs to store extradf_x
? - is there any strong reason of preferring
Tmp2
overTmp1
? In this simple example,df_x
had little processe, in other examples, I may want do a heavy process fordf_x
. I’m wondering how much extra burden wouldTmp1
create .
@dataclass
class Tmp0:
df_x: pd.DataFrame
keep_zeros: bool = False
def transform(self):
if not self.keep_zeros:
self.df_x = self.df_x.replace(0, np.nan).dropna(how='all').dropna(how='all', axis=1)
return self.df_x
@dataclass
class Tmp1:
df_x: pd.DataFrame
df_y: pd.DataFrame
keep_zeros: bool = False
def __post_init__(self):
self.df_x = Tmp0(self.df_x, self.keep_zeros).transform()
def transform(self):
self.df_y = self.df_y.where(self.df_x.notna())
return self.df_y
@dataclass
class Tmp2:
df_y: pd.DataFrame
def __post_init__(self):
self.df_x = Tmp0(self.df_x).transform()
def transform(self, df_x, keep_zeros=False):
df_x = Tmp0(df_x, keep_zeros).transform()
self.df_y = self.df_y.where(df_x.notna())
return self.df_y
df_x = load_data()
tmp1 = Tmp1(df_x, df_y)
tmp2 = Tmp2(df_y)