I’m working on creating a heatmap in R using ggplot2 to visualize data on amino acid properties across different transmembrane regions ™ and taxa. I’ve managed to create the heatmap, but I’m facing an issue with the facets.
My dataset consists of three main columns: characteristic (representing amino acid properties), tm (representing transmembrane regions), and taxa (representing different taxa or species). Each combination of characteristic, tm, and taxa has a corresponding number value indicating the count of residual pairs.
Here’s my raw data in google drive link:
Raw data in csv format
What I’m aiming for is to create facets for each taxa, but only include tm values in each facet if there are non-zero number values for that tm in that specific taxa for all characteristic.
I’ve tried various approaches, including filtering the dataset before plotting, but I haven’t been successful in achieving the desired result. The facets still include tm values that have zero counts for all characteristics.
Here’s the code I’m currently using to create the heatmap:
# Load necessary libraries
library(ggplot2)
library(reshape2)
library(RColorBrewer)
library(tidyr) # For the complete function
library(dplyr) # For data manipulation
# Display the structure of the dataframe to verify
str(data)
# Define the substitutions
substitutions <- c(
"Hydroxyl" = "OH",
"Polar" = "P",
"Uncharged" = "UC",
"Aliphatic" = "AL",
"Nonpolar" = "NP",
"Hydrophobic" = "HP",
"Aromatic" = "AR",
"Sulfur" = "SU",
"Basic" = "BA",
"Charged" = "CH",
"Hydrophilic" = "HI",
"Acidic" = "AC",
"Amide" = "AM",
"Small" = "S",
"Methionine" = "M"
)
# Perform substitutions in the 'characteristic' column
data$characteristic <- as.character(data$characteristic)
for (pattern in names(substitutions)) {
data$characteristic <- gsub(pattern, substitutions[pattern], data$characteristic)
}
# Ensure that all combinations of characteristic, tm, and taxa are present
# Fill missing combinations with 0
all_combinations <- expand.grid(
characteristic = unique(data$characteristic),
tm = unique(data$tm),
taxa = unique(data$taxa)
)
# Merge with the original data and fill missing values with 0
data_complete <- left_join(all_combinations, data, by = c("characteristic", "tm", "taxa"))
data_complete[is.na(data_complete)] <- 0
# Calculate the total number for each taxa and tm combination
taxa_tm_totals <- data_complete %>%
group_by(taxa, tm) %>%
summarise(total = sum(number))
# Filter out tm values where the total number is 0 for a specific taxa
data_filtered <- data_complete %>%
semi_join(taxa_tm_totals %>% filter(total != 0), by = c("taxa", "tm"))
# Filter out taxa where all tm values have 0 number
taxa_filtered <- data_filtered %>%
group_by(taxa) %>%
filter(sum(number) != 0) %>%
distinct(taxa)
# Define the order of the taxa
taxa_order <- taxa_filtered$taxa
# Convert the taxa column to a factor with the specified levels
data_filtered$taxa <- factor(data_filtered$taxa, levels = taxa_order)
# Define the colors for the heatmap
colors <- c("#ffffff", "#f7f9cb", "#c7edc0", "#92e0c3", "#56d0d0", "#00bdde", "#00afef", "#539df4", "#9085e8", "#cd6ace", "#f54a9d", "#ff3d5e", "#f0550d", "#fc8d59")
# Create the heatmap using ggplot2 and facet by 'taxa' with shared axes
ggplot(data_filtered, aes(x = characteristic, y = tm, fill = as.numeric(number))) +
geom_tile(color = "black", size = 0.2) +
geom_text(aes(label = ifelse(number != 0, label, "")), size = 3) +
scale_fill_gradientn(colors = colors) +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
panel.border = element_rect(color = "black", fill = NA, size = 1),
axis.line = element_line(color = "black"),
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
axis.title = element_text(face = "bold", size = 12),
axis.text = element_text(size = 10, face = "bold"),
legend.title = element_text(face = "bold", size = 12),
legend.text = element_text(size = 10, face = "bold"),
legend.box.background = element_rect(color = "black", size = 1),
strip.background = element_blank(), # Remove the background of facet labels
strip.text.y.right = element_text(size = 10, face = "bold", angle = 0), # Style the facet labels on the right
strip.placement = "outside" # Place facet strips outside
) +
labs(
title = "Heatmap of Amino Acid Properties",
x = "Properties",
y = "Transmembrane Regions",
fill = "Number of nResidual Pairs"
) +
facet_wrap(~ taxa, ncol = 1, strip.position = "right") +
theme(panel.spacing = unit(0, "lines")) # Remove space between facets
This is the plot that I get:
Result](https://i.sstatic.net/rUnasDVk.png)
Instead of completely removing the particular tm value for a taxa, it is just putting out a blank row. How to fix it?
Desired output:
I want to completely remove the tm values for a particular taxa if it is zero for all characteristics.
I’d greatly appreciate any insights or suggestions on how to exclude rows from the heatmap, ensuring that facets include only tm values with non-zero counts for all characteristic for a particular taxa. If there’s a more effective or alternative method to accomplish this task, I’m eager to hear about it.