Found an old question for this problem in pandas:
I want to do something that seems pretty easy in a spreadsheet but I can’t figure the syntax in pandas. I have a data set that can be grouped. I want to determine the aggregate stats for each of the groups, but then use the aggregates to create a new column back in the original data frame.
For example, if my data frame looks like this:
d = pandas.dataframe({'class', : ['f1', 'f2', 'f3', 'f1'],
'user': ['jack', 'jen', 'joe', 'jan'],
'screen': [12, 23, 13, 15] })
Original accepted solution:
d['gp'] = d.groupby('class')['screen'].transform('std')
print (d)
class screen user gp
0 f1 12 jack 2.12132
1 f2 23 jen NaN
2 f3 13 joe NaN
3 f1 15 jan 2.12132
My question is how would we achieve the result using databricks api (SQL or pysaprk) only