I’m using ggsankeyfier
functions to make a sankey diagram, wich may not be the best solution for this but it is the best i have so far.
I’ve managed to create this plot:
from this example data:
t = data.frame(client = rep(c('a','b','c','d','e','f','g'), each = 4),
year = rep(2021:2024,7),
class = sample(0:3,size = 28,replace = T,prob = c(0.6,0.3,0.09,0.1)))
t %>% pivot_wider(names_from = year, values_from = class) %>%
mutate(final_class = 1) %>%
pivot_stages_longer(as.character(2021:2024), values_from = 'final_class') %>%
ggplot(aes(x = stage, y = final_class, group = node, connector = connector, edge_id = edge_id))+
geom_sankeyedge(aes(fill = node),position = pos, ncp = 1) +
geom_sankeynode(aes(fill = node),position = pos) +
geom_text(aes(label = str_wrap(node, 20)), position = pos_text, stat = "sankeynode",
hjust = 0, cex = 2) +
theme_minimal()
My goal is to track the flux of client between classes across years. This plot is ok for my purposes but I have two problems with it: It involves the creation of this final class
variable, which seems really redundant. The plot also puts nodes in increasing order, I’d like to have them by class order (0 on the bottom, 3 on top, etc). I tried to set y=node, however this loses the width of the nodes.
What could I be doing better?