Here is some simulate data I have in R:
library(ggplot2)
library(ggrepel)
set.seed(123)
data1 <- rnorm(100)
data2 <- rnorm(100)
data3 <- rnorm(100)
df <- data.frame(
Category = rep(c("Data 1", "Data 2", "Data 3"), each = 100),
Values = c(data1, data2, data3)
)
Here are summary stats for the data:
library(dplyr)
summary_stats <- df %>%
group_by(Category) %>%
summarise(
lower = quantile(Values, 0.25),
middle = median(Values),
upper = quantile(Values, 0.75),
mean = mean(Values)
)
Using tutorials like this previous question
(How to create a grouped boxplot in R?), I made a boxplot (after much trial and error)::
ggplot(df, aes(x = Category, y = Values, fill = Category)) +
geom_boxplot() +
geom_text_repel(
data = summary_stats,
aes(x = Category, y = middle, label = paste("Median:", round(middle, 2))),
nudge_y = 0.2,
size = 3
) +
geom_text_repel(
data = summary_stats,
aes(x = Category, y = lower, label = paste("Lower Quartile:", round(lower, 2))),
nudge_y = -0.2,
size = 3
) +
geom_text_repel(
data = summary_stats,
aes(x = Category, y = upper, label = paste("Upper Quartile:", round(upper, 2))),
nudge_y = 0.2,
size = 3
) +
geom_text_repel(
data = summary_stats,
aes(x = Category, y = mean, label = paste("Mean:", round(mean, 2))),
nudge_y = -0.2,
size = 3
) +
scale_fill_manual(values = c("Data 1" = "red", "Data 2" = "blue", "Data 3" = "green")) +
theme_minimal() +
labs(title = "Boxplots of Random Data", x = "Category", y = "Values", fill = "Category")
I am now wondering about the following question:
Suppose I only have access to the summary_stats and not the original data – is it still possible to make this kind of boxplot in R? Suppose I can extract all the summary statistics I need from the original data (e.g. standard deviation, mean, median, quartiles, min, max, etc.) – is it somehow possible to still make this boxplot in R?
This is as close as I got:
summary_stats <- df %>%
group_by(Category) %>%
summarise(
lower = quantile(Values, 0.25), # Lower quartile
middle = median(Values), # Median
upper = quantile(Values, 0.75), # Upper quartile
ymin = min(Values), # Minimum
ymax = max(Values) # Maximum
)
ggplot(summary_stats, aes(x = Category, fill = Category)) +
geom_boxplot(aes(
lower = lower,
middle = middle,
upper = upper,
ymin = ymin,
ymax = ymax
), stat = "identity") +
scale_fill_manual(values = c("Data 1" = "red", "Data 2" = "blue", "Data 3" = "green")) +
theme_minimal() +
labs(title = "Boxplots of Summary Statistics", x = "Category", y = "Values", fill = "Category")
I think this recreates most of the plot, but the outliers are missing. I am not sure how to include the outliers in the summary stats file and then recreate the boxplot directly from the summary stats. Can someone please show mw how to do this?