I am a long time SAS/SQL user and have always defaulted to using SQL for my groupbys
for example to do
select region
,case when age < 5 then 'Low'
when age >= 5 and age <= 10 then 'Middle'
else 'High' as duration
,sum(1) as total
,sum(profit) as profit
,sum(profit)/sum(1) as avg_profit
,max(revenue) as max revenue
from table
where region not in ('A')
group by
region,(case when age < 5 then 'Low'
when age >= 5 and age <= 10 then 'Middle'
else 'High)
I am trying to recreate the above in Pandas but I don’t know how to write it in as little code as possible as the above
Can anyone suggest an efficient way to write this in Pandas that doesn’t involve 5 merges and creating new columns beforehand?
New contributor
Mobix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.