I’m using Python 3.10 with Plotly
I have a dataframe called “cluster_user_distribution_data” with around 40 rows and a varying number of columns depending on an analysis. But lets say there are four. It looks like this:
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Almtunaskolan 31.4 6.2 24.8 37.6
Almunge skola 40.1 6.2 23.6 30.2
Bergaskolan (Videskolan) 32.6 6.2 26.0 35.1
Danmarks skola 32.2 6.2 20.2 41.3
...
From this I want to make a stacked bar plot showing the distribution between the clusters, and with different subplots grouping the users (the index names) based on sharing the same majority clusters. So far so good, it looks like this:
Current stacked bar plot
But I want to make the user names anonymous, so I have renamed them based on how “large” the users are. And now the dataframe looks like this instead:
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Medium school 31.4 6.2 24.8 37.6
Medium school 40.1 6.2 23.6 30.2
Large school 32.6 6.2 26.0 35.1
Small school 32.2 6.2 20.2 41.3
...
And here the problem begins. Now the Plotly bar chart combines names that are the same, and since I only got three different names it now looks like this (I’e re-scaled the x-axis to show how the data now is aggregated the wrong way):
Stacked bar plot with merged names
I have searched for a solution, and found a similar case in text that worked for a non-stacked bar chart. But I cannot make it to work with mine. I have tried to implement it (full code in the bottom) as:
fig.update_layout(
yaxis = dict(
tickmode = 'array',
tickvals = np.arange(0,len(y_data)),
ticktext = y_data
)
)
But when changing the bar to barmode="stacked"
it becomes messy.
It looks like this then (showing only the first subplot):
enter image description here
The desired outcome is to make the plot looks like the second image, but with the anonymous names small/medium/large school. That is, to separate them even though the names are the same.
The code I have used is very similar to “Color Palette for Bar Chart” example at text. My current code without color manipulation and scaling etc) looks like:
#clusters_with_majority is an array with the unique clusters that are the majority among at least one user, e.g. [0,1,3]
#cluster_user_distribution_data_largest_cluster is a series with the majority cluster for each user
for i in np.sort(clusters_with_majority):
x_data = cluster_user_distribution_data.loc[cluster_user_distribution_data_largest_cluster==i]
#sort the data to barplot in decreasing order
x_data = x_data.sort_values(x_data.columns[i])
#get the sorted index list and the data to array/list for use in the barplot loop
y_data = x_data.index.tolist()
x_data = x_data.values
#shift the cluster in focus to the first column
x_data[:,[0,i]] = x_data[:,[i,0]]
for j in range(0, len(x_data[0])):
for xd, yd in zip(x_data, y_data):
fig.add_trace(go.Bar(
x=[xd[j]], y=[yd],
orientation='h',
marker=dict(
color=colors_loop[j],
line=dict(color='black', width=bar_plot_line_width)
),
showlegend=False,
),
row = row_number,
col = 1,
)
I’ve tried to find more similar cases, but those that have worked have not been for stacked bars of the ones I’ve found.