I know I could easily use UNION ALL
to get the results I want, but I have to generate 5 tables of each containing 15 rows of metrics/KPI.
To simplify, I have data like this:
user | sex | state | race |
---|---|---|---|
usr1 | M | PA | B |
usr2 | F | TX | W |
usr3 | M | TX | B |
usr4 | D | NE | H |
usr5 | F | TX | A |
As a result, I need to format the data like:
Category | Total | Percentage |
---|---|---|
Total | 5 | 100% |
Male | 2 | 40% |
Female | 2 | 40% |
Other/Declined | 1 | 20% |
In State | 3 | 60% |
Out State | 2 | 40% |
Black | 2 | 40% |
White | 1 | 20% |
I know I could write a unioned query such as:
SELECT
'Total' AS Category
, COUNT(ud.user) AS Total
, '100%' AS Percentage
FROM UserData ud
UNION ALL
SELECT
'Male' AS Category
, SUM(CASE WHEN ud.sex = 'M' THEN 1 END) AS Total
, (SUM(CASE WHEN ud.sex = 'M' THEN 1 END) / COUNT(ud.user)) * 100 AS Percentage
FROM UserData ud
UNION ALL
...
FROM ...
UNION ...
If unions really are the only/best/most sensible way, so be it. I feel like there should be a way to do this with a relatively singular query rather than blocks of 15 unioned queries. Is there a way to do this with PIVOT/UNPIVOT or something else, or am I doomed to unions?
1
Using an UNPIVOT
, GROUP BY
, and a couple of COUNT(*)
functions (one a window function), you can get most of what you need:
SELECT
*,
Total * 100 / SUM(Total) OVER(PARTITION BY Attribute) AS Percentage
FROM (
SELECT Attribute, Category, COUNT(*) AS Total
FROM DATA D
UNPIVOT (
Category
FOR Attribute IN (sex, state, race)
) U
GROUP BY Attribute, Category
) G
ORDER BY Attribute, Total DESC, Category
But the above does not give you a “Total” row.
Instead, you can use a CROSS APPLY (VALUES...)
to perform the equivalent of an UNPIVOT
while adding an extra element to give you the “Total” row. This also allows your to more easily define the results order.
SELECT
*,
Total * 100 / SUM(Total) OVER(PARTITION BY Attribute) AS Percentage
FROM (
SELECT Ord, Attribute, Category, COUNT(*) AS Total
FROM DATA D
CROSS APPLY (
VALUES
(1, 'Total', 'Total'),
(2, 'sex', sex),
(3, 'state', state),
(4, 'race', race)
) V(Ord, Attribute, Category)
GROUP BY Ord, Attribute, Category
) G
ORDER BY Ord, Total DESC, Category
Results:
Ord | Attribute | Category | Total | Percentage |
---|---|---|---|---|
1 | Total | Total | 5 | 100 |
2 | sex | F | 2 | 40 |
2 | sex | M | 2 | 40 |
2 | sex | D | 1 | 20 |
3 | state | TX | 3 | 60 |
3 | state | NE | 1 | 20 |
3 | state | PA | 1 | 20 |
4 | race | B | 2 | 40 |
4 | race | A | 1 | 20 |
4 | race | H | 1 | 20 |
4 | race | W | 1 | 20 |
The above does not map the source data to the specific values you have shown, but that should be a simple matter of adding a few joins to the associated lookup tables and replacing the specific value referenced below as needed. You can of course also omit the result columns that you do not need.
See this db<>fiddle for a demo.
This can be done with grouping sets
:
with DataCategories as (
select
case sex when 'M' then 'Male' when 'F' then 'Female' else 'Other/Declined' end as SexCategory,
case state when 'TX' then 'In State' else 'Out State' end as RegionCategory,
case race when 'A' then 'Asian' when 'B' then 'Black' when 'W' then 'White' when 'H' then 'Hispanic' else race end as RaceCategory,
count(*) over () as Total
from Data
)
select
coalesce(SexCategory, RegionCategory, RaceCategory, 'Total') as Category,
count(*) as Total,
cast(count(*) * 100.0 / min(Total) as decimal(5, 2)) as Percentage,
/* another way to count the total rows dynamically */
sum(count(*)) over (partition by grouping_id(SexCategory, RegionCategory, RaceCategory)) as Total2
from DataCategories
group by grouping sets ( (), (SexCategory), (RegionCategory), (RaceCategory) )
order by coalesce(nullif(grouping_id(SexCategory, RegionCategory, RaceCategory), 7), 1);
https://dbfiddle.uk/QQ2cj_I9
One possible advantage to this approach is that you wouldn’t necessarily need to list out all the values for the unpivot operation. If say a new race code snuck into the data then it could just be left as is/untranslated and the category would still appear. And it’s even easier for values that should display unchanged.
You can use unpivot and grouping.
See example
select data,case when data<>'total' then info else 'Total' end category
,count(distinct userid) qty
,count(userid)*100.0/max(case when data='total' then cast(info as int) end )over() pct
from(
SELECT userid,data,info FROM (
SELECT userid -- cast(userid as varchar(10)) userid
,CAST(case when sex='M' then 'Male' when sex='F' then 'Female'
else 'Other/Declined' end
AS varchar(10)) sex
,CAST(case when state='TX' then 'In State'
else 'Out State' end AS varchar(10)) state
,CAST(case when race='W' then 'White' when race='B' then 'Black'
when race='H' then 'Hispanic'
else 'Undefined race' end
AS varchar(10)) race
,cast(count(*)over() as varchar(10)) total
FROM userdata
) x
UNPIVOT( info
FOR data IN( sex, state, race,total)
) unpvt
)t
group by data,info -- with rollup
order by data;
data | category | qty | pct |
---|---|---|---|
race | Black | 2 | 40.000000000000 |
race | Hispanic | 1 | 20.000000000000 |
race | Undefined | 1 | 20.000000000000 |
race | White | 1 | 20.000000000000 |
sex | Female | 2 | 40.000000000000 |
sex | Male | 2 | 40.000000000000 |
sex | Other/Decl | 1 | 20.000000000000 |
state | In State | 3 | 60.000000000000 |
state | Out State | 2 | 40.000000000000 |
total | Total | 5 | 100.000000000000 |
Unpivoted output
userid | category | info |
---|---|---|
usr1 | sex | Male |
usr1 | state | Out State |
usr1 | race | Black |
usr1 | total | 5 |
usr2 | sex | Female |
usr2 | state | In State |
usr2 | race | White |
usr2 | total | 5 |
usr3 | sex | Male |
usr3 | state | In State |
usr3 | race | Black |
usr3 | total | 5 |
usr4 | sex | Other/Decl |
usr4 | state | Out State |
usr4 | race | Hispanic |
usr4 | total | 5 |
usr5 | sex | Female |
usr5 | state | In State |
usr5 | race | Undefined |
usr5 | total | 5 |
fiddle
It seems the solution here is just a simple UNPIVOT and I was struggling to wrap my head around/conceptualize it. I think needing the percentage was making this seem more complicated than it needed to be.
None of these answers really got me to the result I needed, however this is what I ended up using which generates the result table in my question perfectly. I’ll wait a little bit to see if a better solution is posted before accepting this answer.
WITH upvt AS (
SELECT *
FROM (
SELECT
COUNT(ud.userid) AS Total
, SUM(CASE WHEN ud.sex = 'M' THEN 1 END) AS Male
, SUM(CASE WHEN ud.sex = 'F' THEN 1 END) AS Female
, SUM(CASE WHEN ud.sex = 'D' THEN 1 END) AS Declined
, SUM(CASE WHEN ud.state = 'tx' THEN 1 END) AS [In State]
, SUM(CASE WHEN ud.state <> 'tx' THEN 1 END) AS [Out State]
, SUM(CASE WHEN ud.race = 'B' THEN 1 END) AS Black
, SUM(CASE WHEN ud.race = 'W' THEN 1 END) AS White
, SUM(CASE WHEN ud.race = 'H' THEN 1 END) AS Hispanic
, SUM(CASE WHEN ud.race = 'A' THEN 1 END) AS Asian
FROM UserData ud
) p
UNPIVOT (
Totals FOR Category IN (Total, Male, Female, Declined, [In State], [Out State], Black, White, Hispanic, Asian)
) AS up
)
SELECT
u.Category
, u.Totals
, (CAST(u.Totals AS FLOAT) / CAST(u2.Totals AS FLOAT)) * 100 AS Percentage
FROM upvt u
CROSS JOIN (
SELECT * FROM upvt WHERE Category = 'Total'
) u2
SQL Fiddle