I’ve been trying to calculate the confidence interval for a distance correlation (energy::dcor)
between two variables, and its p value.
The data is:
structure(c(-0.300232326122519, -0.548312017606863, 0.463488637846498, -0.957754830334244, 0.489005536982273, 1.60039641596019, 0.198325183245884, 0.955655402059422, -0.0985745060548668, -0.91537165915946, 0.821680161225068, -0.276711736364627, -0.145523522598367, 1.00647170925371, 1.47737776565003, -0.429641721588158, -0.983271729470019, -0.37651945096644, -0.155999049481896, -1.08716422579988, -0.393385723005449, 3.06380556423845, -0.0666207802774695, -0.983271729470019, -0.196076257229251, 0.278960662381259, -0.641728987053197, -0.121785441763213, 0.261876899264989, -0.682069767363957, -0.285408444947535, -0.425293367296704, 0.227926864109709, -1.08716422579988, 0.312910697536539, 0.68133965466455, 0.13653339873247, 0.484958662776278, 0.109524745220057, -0.462267094226346, -1.81263562663225, 1.20651893229729, 0.684989465059371, -0.906637989461132, -0.96098206068222, 0.360589246740862, -0.300919230157815, 0.309013933026176, 0.892150734542308, 0.574948993393824, 0.579038741277517, 0.352878039434503, -0.107402964012698, -0.675502008954803, -0.496565311995179, -0.700466752538984, -0.742382241262757, 0.938355977216629, -0.14196287615184, 0.386721998266575, 0.51643524328218, 0.455222542513213, 0.327148349208276, 0.747334267020053, 1.93710502202678, -1.19755578540738, -0.722586352547963, -1.54719618551513, 0.144258727383104, 0.227261483045605), dim = c(35L, 2L))
I’ve performed the energy::dcor.test
, with the following results:
data: index 1, replicates 1000
dCor = 0.28877, p-value = 0.4785
sample estimates:
dCov dCor dVar(X) dVar(Y)
0.1469361 0.2887657 0.5039591 0.5137722
So I would say the correlation is not significant, which is obvious when plotting the data too. However, when I try to get the 95% bootstrap confidence interval of said correlation, this is what I get:
`boot.dcor <- function(data, i) {
x <- data[i, 1]
y <- data[i, 2]
energy::dcor(x,y)
}
boot.out <- boot(data, statistic = boot.dcor, R = 1000, sim = "balanced")
boot.ci(boot.out, type = "perc")`
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot.out, type = “perc”)
Intervals :
Level Percentile
95% ( 0.2961, 0.5380 )
As you can see, the confidence interval is quite distant from 0, and it doesn’t even contain the dcor obtained previously (0.289), and I don’t understand why. Any help would be really appreciated, thank you
I tried different ways to calculate the confidence interval using bootstrapping, for instance changing the sim argument in boot to get different simulations (permutation, antithetic). I’ve also tried another way to get the CI, using bcaboot::bcajack(data, B = 1000, func = boot.dcor, alpha = 0.05)
. However, I never get 0 in the confidence intervals, which I would imagine would be there since the correlation is not significant. Trying the same procedure but with a pearson or spearman correlation, then I get confidence intervals that, as expected, contain 0, and are distributed around it. The problem seems to really be related just to the distance correlation and I also couldn’t find how the p value is calculated in dcor.test, nor which other methods besides bootstrapping there could be to get the confidence interval.
Capi Bara is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.