I am attempting to write probability density function for 2×2 contingency table with given margins, using chi-squared statistics.
I am aware there are easy ways to do this with native R functions, but I am interested in trying to get a better understanding of R, thus I want to try this the “hard way”.
a b ab
c d cd
ac bd abcd
a, b, c, and d are the contingency table elements. In the parlance of gene set enrichment, a is the overlap, ab (a + b) is the total number of genes in gene set, ac (a + c) is the total number of genes in the group of interest (e.g., all overexpressed genes), and abcd is the total of all detected genes.
Typically, user has the values for a, ab, ac, and abcd easily available, hence this particular parameterization.
I want to create probability density function integrand() that, from arguments a, ab, ac, and abcd, yields value for the given a. Importantly, integrand() must be vectorized with respect to argument a as integrate() (see below) requires this for calculating the integral. Other arguments (ab, ac, and abcd are, and should be, vectors of length 1).
integrand <- function(a, ...)
{
args <- list(...)
sapply(a, (x, ...)
{
args <- list(...)
observed <- c(x, args[[1]] - x, args[[2]] - x, args[[3]] - args[[2]] - args[[1]] + x)
expected <- c(args[[1]] * args[[2]], args[[1]] * (args[[3]] - args[[2]]), (args[[3]] - args[[1]]) * args[[2]], (args[[3]] - args[[1]]) * (args[[3]] - args[[2]])) / args[[3]]
s <- sum((observed - expected)^2 / expected)
(s^-0.5) * exp(-s / 2) / (2^0.5 * gamma(0.5)) # df fixed to 1 appropriately for 2x2 contingency table.
}
, args[[1]], args[[2]], args[[3]])
}
ab <- 11
ac <- 10
abcd <- 25
# plot the pdf between 0 and 10
curve(integrand(x, ab, ac, abcd), 0, 10)
# integrate between 1 and 5
integrate(integrand, lower = 1, upper = 5, subdivisions = 1000, ab = ab, ac = ac, abcd = abcd)
But integrate() doesn’t always work, yielding an error message “roundoff error is detected in the extrapolation table”. I suspect because there is a sharp transition or discontinuity in the pdf.
I have banged my head to the wall two days now and I can’t figure out why my function is not working robustly. The only reason I can think of is that (s^-0.5) * exp(-s / 2) / (2^0.5 * gamma(0.5))
is somehow wrong, but I can’t see the mistake.