I have this table (myt) :
CREATE TABLE myt (
name VARCHAR(50),
food VARCHAR(50),
d1 INT
);
INSERT INTO myt (name, food, d1) VALUES
('john', 'pizza', 2010),
('john', 'pizza', 2011),
('john', 'cake', 2012),
('tim', 'apples', 2015),
('david', 'apples', 2020),
('david', 'apples', 2021),
('alex', 'cookies', 2005),
('alex', 'cookies', 2006);
name food d1 food_year
john pizza 2010 2010
john pizza 2011 2011
john cake 2012 2012
tim apples 2015 2015
david apples 2020 2020
david apples 2021 2021
alex cookies 2005 2005
alex cookies 2006 2006
I wrote the following query to find out the percentage breakdowns of each food per name:
WITH FoodCounts AS (
SELECT name,
food,
COUNT(*) as food_count
FROM myt
GROUP BY name, food
),
TotalCounts AS (
SELECT name,
COUNT(*) as total_count
FROM myt
GROUP BY name
)
SELECT fc.name,
fc.food,
(fc.food_count * 100.0) / tc.total_count as percentage
FROM FoodCounts fc
JOIN TotalCounts tc
ON fc.name = tc.name;
name food percentage
alex cookies 100.00000
david apples 100.00000
john cake 33.33333
john pizza 66.66667
tim apples 100.00000
My Question: I am now trying to modify this query to find out the cumulative percentages. For example, as of 2011 – what was John’s food breakdown? As of 2012 what was John’s food breakdown?
I tried to write a series of CTE’s using window functions to answer this:
WITH YearlyFoodCounts AS (
SELECT name,
food,
food_year,
COUNT(*) as food_count
FROM myt
GROUP BY name, food, food_year
),
CumulativeCounts AS (
SELECT name,
food_year,
SUM(food_count) OVER (PARTITION BY name ORDER BY food_year) as cumulative_count
FROM YearlyFoodCounts
)
SELECT yfc.name,
yfc.food,
yfc.food_year,
yfc.food_count,
cc.cumulative_count,
(yfc.food_count * 100.0) / cc.cumulative_count as percentage
FROM YearlyFoodCounts yfc
JOIN CumulativeCounts cc
ON yfc.name = cc.name AND yfc.food_year = cc.food_year
ORDER BY yfc.name, yfc.food_year;
The result seems to be in the correct format:
name food food_year food_count cumulative_count percentage
alex cookies 2005 1 1 100.00000
alex cookies 2006 1 2 50.00000
david apples 2020 1 1 100.00000
david apples 2021 1 2 50.00000
john pizza 2010 1 1 100.00000
john pizza 2011 1 2 50.00000
john cake 2012 1 3 33.33333
tim apples 2015 1 1 100.00000
Can someone please tell me if this is the correct way to approach this problem?
Thanks!