I’ve been attempting to clean a dataset for a project, but ran into issues when trying to transform the data. The provided problem set contains a similar scenario to the real dataset. If you look closely at the [Original] dataset, the number of fruit and the color will both separately add up to the “total count” row, which is 10. My goal is to break those descriptive variables out of the data and align them each with their own columns, as seen in the [Desired Outcome]. This would allow me to visualization the data in Tableau with no issues.
Upon researching my problem, my initial thought with that the solution had to be associated with a pivot or unpivot function. However, because there seems to be two different descriptive variables within the data, I didn’t find much success. It’s worth noting that the original dataset is populated with hundreds of thousands of rows, so a simple creation and translation of a new table may not be easily executed.
Overall, I am completely unsure how to approach this problem. Any help or advice would be greatly appreciated.
Original:
Desired Outcome:
3
I stared doing this in oracle sql before I saw that you are using postgresql but maybe this can help you anyway? I assume postgresql do not allow 2 columns with the same name either…(orange)
Edit. postgresql solution
https://dbfiddle.uk/CNr1zblm
with data as (
Select 'Fruit' as Food,10 as Value, 'Total count' as Characteristics, null as Color, null as Location, null as "date" from dual union all
Select 'Fruit' ,3 , 'apple' , null , null , null from dual union all
Select 'Fruit' ,1 , 'orange' , null , null , null from dual union all
Select 'Fruit' ,4 , 'banana' , null , null , null from dual union all
Select 'Fruit' ,2 , 'cherry' , null , null , null from dual union all
Select 'Fruit' ,2 , null , 'red' , null , null from dual union all
Select 'Fruit' ,2 , null , 'yellow' , null , null from dual union all
Select 'Fruit' ,2 , 'cherry' , 'orange' , null , null from dual
)
select
Food,
max(case when Characteristics = 'Total count' then Value end) as "Total count",
max(case when Characteristics = 'apple' then Value end) as apple,
max(case when Characteristics = 'orange' then Value end) as orange,
max(case when Characteristics = 'banana' then Value end) as banana,
max(case when Characteristics = 'cherry' then Value end) as cherry,
max(case when Color = 'red' then Value end) as red,
max(case when Color = 'yellow' then Value end) as yellow,
max(case when Color = 'orange' then Value end) as orange_color
from data
group by Food;
1