I was trying to make a heatmap and hierarchical clustering with a dendrogram of samples.
I’m trying to follow this particular thread from StackOverflow(Merging multiple hclust objects (or dendrograms))
I have a large dataframe,
head(joined_df_sorted)
# A tibble: 6 × 24
chrom start end EE85756 EE85757 EE85770 EE85775 EE85784 EE87786 EE87787 EE87788 EE87789 EE87790 EE87811 EE87812 EE87813 EE87814 EE87815
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 chr1 1000001 2000000 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3
2 chr1 3000001 4000000 3 4 3 3 3 8 3 3 3 3 3 3 3 3 3
3 chr1 4000001 5000000 3 4 3 3 3 13 3 3 8 3 3 3 3 3 6
4 chr1 5000001 6000000 3 4 3 3 3 10 3 3 7 3 3 3 3 3 3
5 chr1 6000001 7000000 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3
6 chr1 7000001 8000000 3 4 3 3 3 7 3 3 3 3 3 3 3 3 3
What I have done till now is,
#Joining first 2 columns for making unique row name
joined_df_sorted$chromid <- paste0(joined_df_sorted$chrom, "_", joined_df_sorted$start)
#Keeping only required columns
joined_df_sorted2 <- as.data.frame(joined_df_sorted[,c(24,4:23)])
#making first column as row name
joined_df_sorted3<-joined_df_sorted2
joined_df_sorted3X <- joined_df_sorted3[,-1]
rownames(joined_df_sorted3X) <- joined_df_sorted3[,1]
## transposing the dataframe without changing datatype
install.packages("sjmisc")
library(sjmisc)
joined_df_sorted3X_t<-joined_df_sorted3X %>%
rotate_df(cn = FALSE)
CanCohortDat<-joined_df_sorted3X_t
BileDuct <- c("EE87786", "EE87787", "EE87788", "EE87789", "EE87790")
Breast <- c("EE87811", "EE87812", "EE87813", "EE87814", "EE87815")
Gastric <- c("EE87893", "EE87894", "EE87895", "EE87896", "EE87897")
Healthy <- c("EE85756", "EE85757", "EE85770","EE85775", "EE85784")
#Separate clustering of 4 distinct datasource
h1 <- hclust(dist(CanCohortDat[BileDuct,]))
h2 <- hclust(dist(CanCohortDat[Breast,]))
h3 <- hclust(dist(CanCohortDat[Gastric,]))
h4 <- hclust(dist(CanCohortDat[Healthy,]))
#merge 4 clusters
hc <- as.hclust(merge(merge(merge(
as.dendrogram(h1), as.dendrogram(h2)), as.dendrogram(h3)),
as.dendrogram(h4)))
CanCoh <-CanCohortDat[c(BileDuct, Breast, Gastric, Healthy),]
cohort_annotation <- data.frame(Region = c(rep("BileDuct", length(BileDuct)),
rep("Breast", length(Breast)),
rep("Gastric", length(Gastric)),
rep("Healthy", length(Healthy))),
row.names = c(BileDuct, Breast, Gastric, Healthy))
#Heatmap
pheatmap(CanCoh , cluster_rows = hc,
annotation_row = cohort_annotation)
#############ERROR##############
`use_raster` is automatically set to TRUE for a matrix with more than 2000
columns You can control `use_raster` argument by explicitly setting TRUE/FALSE
to it.
Set `ht_opt$message = FALSE` to turn off this message.
'magick' package is suggested to install to give better rasterization.
Set `ht_opt$message = FALSE` to turn off this message.
Error in hclust(get_dist(t(submat), distance), method = method) :
NA/NaN/Inf in foreign function call (arg 10)
In addition: Warning message:
The input is a data frame, convert it to the matrix.
My Output should look like this: EE* are the different samples that should be at y axis and grouped by the merged dendrogram and x-axis should display the chromosome location.