I am trying to get the residuals from a simple regression. This regressions is run by each Year and Group. This is what I have done. Howevevr, is there a way to get that residuals as a new columns along with original dataframe?
df=pd.DataFrame({'Name':['a','b','c','d','e','a','b','c','d','e','a','b','c','d','e'],
'Year':[2020,2020,2020,2020,2020,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022],
'Group':['H','L','N','N','N','H','L','N','N','N','H','L','N','N','N'],
'Value':[0.3,0.2,0.3,0.1,0.1,0.2,0.3,0.2,0.2,0.1,0.4,0.1,0.1,0.3,0.1],
'Mom':[5,1,3,5,2,1,1,3,6,4,4,7,8,3,2]})
def a(row):
X = row['Value'] # independent variable
y = row['Mom'] # dependent variable
X = sm.add_constant(X)
reg = sm.OLS(y, X).fit()
return reg.resid
df.groupby(['Year','Group']).apply(a)
I can certainly append the above output to original dataframe but I am trying to achieve this by transform. I have tried this but it didn’t work out.
df.groupby(['Year','Group']).transform(a)