I have run an R package (scSHC
) that, given a dataset and a list of clustering labels, it re-analyzes the data and returns a data.tree
object with the statistically significant clusters, and a list of the new clustering labels.
> print(test_scSHC_SCT_seu0.5[[2]], "level", "height")
levelName level height
1 Node 0: 0 1 9
2 ¦--Cluster 1 2 1
3 °--Node 1: 0 2 8
4 ¦--Cluster 2: 0.09 3 1
5 °--Node 2: 0 3 7
6 ¦--Cluster 3: 1 4 1
7 °--Node 3: 0 4 6
8 ¦--Cluster 4 5 1
9 °--Node 4: 0 5 5
10 ¦--Node 5: 0 6 4
11 ¦ ¦--Cluster 5 7 1
12 ¦ °--Node 7: 0 7 3
13 ¦ ¦--Cluster 6 8 1
14 ¦ °--Node 10: 0 8 2
15 ¦ ¦--Cluster 11 9 1
16 ¦ °--Cluster 12 9 1
17 °--Node 6: 0 6 3
18 ¦--Node 8: 0 7 2
19 ¦ ¦--Cluster 7 8 1
20 ¦ °--Cluster 8 8 1
21 °--Node 9: 0 7 2
22 ¦--Cluster 9 8 1
23 °--Cluster 10 8 1
In this case, Clusters 2 & 3’s subdivisions didn’t pass the p-value <0.05 significance threshold (that’s the number next to the label) and are merging 2 original clustering labels in each of them.
Now, I want to generate a plot like this one (using ggplot2
), but adding the tree/dendrogram on the X-axis, with proper alignment of each element:
(note, this plot uses the old labels and needs to be changed)
I was planning to use patchwork to do that, but I am having trouble generating a nice-looking tree/dendrogram in ggplot2 format.
When I use the plot
function with center = TRUE
I get a nice-looking tree:
> plot(as.dendrogram(test_scSHC_SCT_seu0.5[[2]]), center = TRUE)
However, the 2 methods I have tried to generate ggplot objects with this tree, have horizontal lines collapsed into the previous level that look very ugly and are similar to the use of plot with center=FALSE
.
plot(as.dendrogram(test_scSHC_SCT_seu0.5[[2]]))
1- Convert the data with as.dendrogram
and use ggdendro::ggdendrogram
to plot it:
> dd <- as.dendrogram(test_scSHC_SCT_seu0.5[[2]], heightAttribute = function(x) x$height )
> head(dd,20)
--[dendrogram w/ 2 branches and 12 members at h = 9]
|--leaf "Cluster 1" (h= 1 )
`--[dendrogram w/ 2 branches and 11 members at h = 8]
|--leaf "Cluster 2: 0.09" (h= 1 )
`--[dendrogram w/ 2 branches and 10 members at h = 7]
|--leaf "Cluster 3: 1" (h= 1 )
`--[dendrogram w/ 2 branches and 9 members at h = 6]
|--leaf "Cluster 4" (h= 1 )
`--[dendrogram w/ 2 branches and 8 members at h = 5]
|--[dendrogram w/ 2 branches and 4 members at h = 4]
| |--leaf "Cluster 5" (h= 1 )
| `--[dendrogram w/ 2 branches and 3 members at h = 3]
| |--leaf "Cluster 6" (h= 1 )
| `--[dendrogram w/ 2 branches and 2 members at h = 2]
| |--leaf "Cluster 11" (h= 1 )
| `--leaf "Cluster 12" (h= 1 )
`--[dendrogram w/ 2 branches and 4 members at h = 3]
|--[dendrogram w/ 2 branches and 2 members at h = 2]
| |--leaf "Cluster 7" (h= 1 )
| `--leaf "Cluster 8" (h= 1 )
`--[dendrogram w/ 2 branches and 2 members at h = 2]
|--leaf "Cluster 9" (h= 1 )
`--leaf "Cluster 10" (h= 1 )
etc...
> ggdendrogram(dd)
2- convert the data to a data.frame
using ggdendro::dendro_data
and use ggplot to draw the segments
> ddd <- dendro_data(dd, type = "rectangle")
> ddd
$segments
x y xend yend
1 6.5 9 1 9
2 1.0 9 1 1
3 6.5 9 2 9
4 2.0 9 2 8
5 7.5 8 2 8
6 2.0 8 2 1
7 7.5 8 3 8
8 3.0 8 3 7
9 8.0 7 3 7
10 3.0 7 3 1
11 8.0 7 4 7
12 4.0 7 4 6
13 8.5 6 4 6
14 4.0 6 4 1
15 8.5 6 5 6
16 5.0 6 5 5
17 9.0 5 5 5
18 5.0 5 5 4
19 9.0 5 9 5
20 9.0 5 9 3
21 7.0 4 5 4
22 5.0 4 5 1
23 7.0 4 6 4
24 6.0 4 6 3
25 7.5 3 6 3
26 6.0 3 6 1
27 7.5 3 7 3
28 7.0 3 7 2
29 8.0 2 7 2
30 7.0 2 7 1
31 8.0 2 8 2
32 8.0 2 8 1
33 11.0 3 9 3
34 9.0 3 9 2
35 11.0 3 11 3
36 11.0 3 11 2
37 10.0 2 9 2
38 9.0 2 9 1
39 10.0 2 10 2
40 10.0 2 10 1
41 12.0 2 11 2
42 11.0 2 11 1
43 12.0 2 12 2
44 12.0 2 12 1
$labels
x y label
1 1 0 Cluster 1
2 2 0 Cluster 2: 0.09
3 3 0 Cluster 3: 1
4 4 0 Cluster 4
5 5 0 Cluster 5
6 6 0 Cluster 6
7 7 0 Cluster 11
8 8 0 Cluster 12
9 9 0 Cluster 7
10 10 0 Cluster 8
11 11 0 Cluster 9
12 12 0 Cluster 10
$leaf_labels
NULL
$class
[1] "dendrogram"
attr(,"class")
[1] "dendro"
> ggplot(segment(ddd)) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend))
As far as I understand, my biggest issue is that the segments representing the jump from one level to the next are aligned to the left. For example, in the dendro_data object the marked segments should have more “centered” x and xend values:
> ddd
$segments
x y xend yend
1 6.5 9 1 9
2 1.0 9 1 1
3 6.5 9 2 9
4 2.0 9 2 8 <-
5 7.5 8 2 8
6 2.0 8 2 1
7 7.5 8 3 8
8 3.0 8 3 7 <-
9 8.0 7 3 7
10 3.0 7 3 1
11 8.0 7 4 7
12 4.0 7 4 6 <-
13 8.5 6 4 6
14 4.0 6 4 1
...
Is there a way to programatically do this or do I need to change the data manually? I have shown just one example but I would need to repeat it for several clustering-depth levels. Also, how can I retrieve the name of all the leaves in the order that they are plotted? (to pass them to the scatter plot as its X-axis order).
Thanks