I am recently reading the book by Testing Statistical Hypotheses (fourth edition) by Lehmann and Romano. At the page 565, theorem 12.2.2, there is a famous coupling construction popularly known as Hajek Coupling, where sum of SRSWOR (simple random sampling with replacement) indicator coupled with the sum of independent
Bernoulli indictor in a claver way. This method. was first describe in the paper of Hajek (1961).
I tried to mimic the procedure to see the validity of the method. For that I conduct a small simulation study in R. using the following code.
rm(list = ls())
library(pbmcapply)
hajek_coupling_generator <- function(bern_ind, n){
B_N = sum(bern_ind)
H_ind <- rep(0, times = length(bern_ind))
if(B_N == n){
r <- 1
H_ind = bern_ind
}
else if(B_N < n){
r <- 2
ind <- union(which(bern_ind == 1),
sample(setdiff(1:N, which(bern_ind == 1)),
size = n-B_N, replace = F))
for (i in ind) {
H_ind[i] <- 1
}
}
else{
r <- 3
ind <- sample(which(bern_ind == 1),
size = n,
replace = F)
for (i in ind) {
H_ind[i] <- 1
}
}
return(list(H_ind, r))
}
N <- 30000
f_N <- 0.1
n <- floor(N*f_N)
Niter <- 2000
# bern_ind <- rbinom(N, 1, n/N)
x <- seq(from = 0, to = 1,
length.out = N) # any positive sequence.
L2_conv.result <- do.call(rbind, pbmclapply(1:Niter,
function(i) {
bern_ind <- rbinom(N, 1, n/N)
tilde_S_n <- sum(x*bern_ind)
H <- hajek_coupling_generator(bern_ind, n)
H_ind <- H[[1]]
r <- H[[2]]
S_n <- sum(x*H_ind)
num <- (tilde_S_n - S_n)^2
return(c(r,tilde_S_n, S_n, num))} , mc.cores = 7))
df <- as.data.frame(L2_conv.result)
mean(df$V4[df$V1 == 1])/var(df$V2[df$V1==1])
mean(df$V4[df$V1 == 2])/var(df$V2[df$V1==2])
mean(df$V4[df$V1 == 3])/var(df$V2[df$V1==3])
print(mean(L2_conv.result[,4])/var(L2_conv.result[,2]))
plot(ecdf(df$V2[df$V1==1]))
lines(ecdf(df$V3[df$V1==1]), col = "red")
plot(ecdf(df$V2[df$V1==2]))
lines(ecdf(df$V3[df$V1==2]), col = "red")
plot(ecdf(df$V2[df$V1==3]))
lines(ecdf(df$V3[df$V1==3]), col = "red")
plot(ecdf(L2_conv.result[,2]), main = "Hajek Coupling")
lines(ecdf(L2_conv.result[,3]), col = "red")
Here we mainly check the following,
mean(L2_conv.result[,4])/var(L2_conv.result[,2])
tending to zero or not.
All the necessary notations are describe in the book.
My code is unable to verify this (i.e. my code does not show any convergence to 0).
>print(mean(L2_conv.result[,4])/var(L2_conv.result[,2]))
>0.7738122
>N <- 40000
>print(mean(L2_conv.result[,4])/var(L2_conv.result[,2]))
>0.7513368
>N <- 1e6
>print(mean(L2_conv.result[,4])/var(L2_conv.result[,2]))
>0.7385734
I am unable to find the where the things go wrong. Any kind of help appreciable.