I have the following problem: I want to investigate the mean differences of 5 balanced groups, but I only have the summary data (means, standard deviations, and sample sizes) available, as well as the information that the summary data is based on normally distributed data.
I’m open to suggestions on how to proceed.
This is some example data
data <- data.frame(
group = c("probe_a", "probe_b", "probe_c", "probe_d", "control"),
n = c(10, 10, 10, 10, 10), # Sample size per group
means = c(66.2, 84.8, 90.3, 78.3, 75.0), # Group means
sd = c(5.9, 5.3, 4.8, 6.0, 3.8) # Standard deviations per group
)
My approach involved using Welch’s ANOVA to account for unequal variances, followed by the post-hoc Games-Howell test. Since I couldn’t directly check the residuals, I simulated them based on the summary data and the normality assumption, and used **bootstrapping **to assess whether the assumptions of normally distributed residuals and homoscedasticity were met, using the Shapiro-Wilk and Levene tests. With a 5% confidence level, the Monte Carlo p-value exceeded this threshold, meaning I couldn’t reject the null hypothesis of non-normality and non-homoscedasticity. This suggests that the residuals violated the assumptions of Welch’s ANOVA. As my guesstimate did not support these assumptions, I concluded that the p-values from Welch’s ANOVA were somewhat exploratory, whereas the post-hoc results were exact.
Is there a more robust method to obtain exact p-values for an omnibus test? I have considered to compare the F-statistic of Welch’s ANOVA with bootstrapped F-statistic of a robust ANOVA (with trimmed means and a heteroscedasticity-consistent covariance matrix) based on data simulated from the summary data and the normality assumption, in order to validate the validity of my approach.
$endgroup$