I am creating a grouped boxplot using plotly. I have to specify the quanitles because I have a specific way of calculating them. I also want to add the outliers to the plot as with standard behavior for a boxplot where plotly calculates the quantiles internally. I am currently trying to add them as a separate trace, but they end up in the middle of the grouped boxes. Maybe there is a way of adding them along with the plotly call that adds the grouped boxes, but if there is I cant’t see it. How can I make it so that the outliers line up with the boxes? Reprex below.
set.seed(123) # Set seed for reproducibility
# Create the site_name column with 5 different site names, each with 20 rows
site_name <- rep(paste0("site_", 1:5), each = 40)
# Create the site_type column with 10 'A's and 10 'B's for each site
site_type <- rep(c("A", "B"), each = 20, times = 5)
# Create the value column with random numbers
value <- runif(100, min = 0, max = 200) # Random numbers between 0 and 100
# Combine into a data frame
df <- data.frame(site_name, site_type, value)
# Display the first few rows of the dataset
head(df, 20)
# Group by site_name and site_type, then calculate summary statistics
stats_df <- df %>%
group_by(site_name, site_type) %>%
summarise(
lower_fence = quantile(value, probs = c(0.05), type = 5, na.rm = TRUE),
q1 = quantile(value, probs = c(0.25), type = 5, na.rm = TRUE),
median = quantile(value, probs = c(0.5), type = 5, na.rm = TRUE),
mean = mean(value, na.rm = TRUE),
q3 = quantile(value, probs = c(0.75), type = 5, na.rm = TRUE),
upper_fence = quantile(value, probs = c(0.95), type = 5, na.rm = TRUE),
sd = sd(value, na.rm = TRUE),
.groups = 'drop'
)
# Create the grouped bar plot
fig <- plot_ly(
data = stats_df,
x = ~factor(site_name),
color = ~factor(site_type),
colors = c("blue","red"),
type = "box",
source = "boxes",
lowerfence = ~lower_fence,
q1 = ~q1,
median = ~median,
q3 = ~q3,
upperfence = ~upper_fence,
showlegend = show_legend
) %>%
layout(boxmode = "group")
# Extract outliers
filtered_df<- df %>%
left_join(stats_df, by = c("site_name", "site_type")) %>%
filter(value < lower_fence | value > upper_fence)
# Add the outlier points
fig <- fig %>%
add_trace(
data = filtered_df,
x = ~factor(site_name),
y = ~value,
color = ~factor(site_type),
colors = landuse_colors,
type = "scatter",
mode = "markers",
marker = list(size = 5, opacity = 0.6), # Customize marker appearance
showlegend = FALSE, # Hide legend for scatter points if desired
inherit = FALSE
)
# Show the figure
fig