I have not found a way to group_by() a column and have a struct field for the new column.Here is a snippet of the dataframe:
Census_Tract GEOID_Census table_B24012 value
str str str f64
"Census Tract 9… "04001942600" "B24012_001E" 32179.0
"Census Tract 9… "04001942700" "B24012_001E" 33483.0
"Census Tract 9… "04001944000" "B24012_001E" 33114.0
"Census Tract 9… "04001944100" "B24012_001E" 26993.0
"Census Tract 9… "04001944201" "B24012_001E" 33798.0
… … … …
"Census Tract 2… "56043000200" "B24012_073E" -6.6667e8
"Census Tract 3… "56043000301" "B24012_073E" -6.6667e8
"Census Tract 3… "56043000302" "B24012_073E" -6.6667e8
"Census Tract 9… "56045951100" "B24012_073E" 41042.0
"Census Tract 9… "56045951300" "B24012_073E" -6.6667e8
Essentially what I am trying to do is to group by Census_Tract
and GEOID_Census
. The columns table_B24012 and value should be combined into one column.
In other words I would like the data to be a struct object such that for each GEOID_Census I have one row value with potential a dictionary structure built into the new column. It would look something like this
Census_Tract GEOID_Census table_B24012
str str str
"Census Tract 9… "04001942600" {"B24012_001E": 32179.0, "B24012_002E":..., ...}
"Census Tract 9… "04001942700" {"B24012_001E" 33483.0, "B24012_002E":..., ...}
"Census Tract 9… "04001944000" {"B24012_001E" 33114.0, "B24012_002E":..., ...}
"Census Tract 9… "04001944100" {"B24012_001E" 26993.0, "B24012_002E":..., ...}
"Census Tract 9… "04001944201" {"B24012_001E" 33798.0, "B24012_002E":..., ...}
If there is a more efficient way to store it in the column, I am open to other ways. I am just trying to condense the information.
I have tried casting the two columns to a list and then struct, as well as just to a dictionary. Nothing has helped. Any ideas? Thanks in advance!
Isaac Beaumont is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.