I am trying to implement quantile regression with sampling weights in R for my analysis. I know in lm()
and glm()
in R, standard errors, and hence confidence intervals, would not be estimated accurately with the option “weights =” since it assumes precision weights, and the survey package should be used instead. However, I cannot understand from the documentation whether that applies to rq()
command for quantile regression as well, so whether the option “weights=” in rq()
assumes precision weights only. Does anyone know?
I only found one resource applying quantile regression in R for survey data: the paper “Estimation of regression quantiles in complex surveys with data missing at random: An application to birthweight determinants”. It uses the following code :
mydesign <- svydesign(ids =~mycluster, strata =~mystrata, fpc =~myfpc, data = mydata, nest = TRUE, weights =~myweights) bootdesign<- as.svrepdesign(mydesign, type ="bootstrap", replicates = 100) fit<- withReplicates(bootdesign, quote(coef(rq(y ~x, tau=0.5, weights=.weights, method="fn"))))
The resulting fitted object fit contains the estimated regression coefficients and their bootstrap
variances. This object can be then passed to the following custom-defined function to produce a
summary table, including p values:
format.rq.svy<- function(x, rdf){ V<- attr(x, "var") FLAG<- length(V)== 1 se<- if(FLAG) sqrt(V) else sqrt(diag(V)) val<- cbind(as.matrix(x), se, NA, NA) if(FLAG) val<- matrix(val, nrow=1) val[,3]<- val[,1]/val[,2] val[,4]<- 2 * (1 - pt(abs(val[, 3]), rdf)) colnames(val)<- c("Value","Std. Error", "t value", "Pr(>|t|)") rownames(val)<- names(x) return(val) }
where the argument “rdf” specifies the residual degrees of freedom (i.e. n – q) for approximate p value
calculation using t-distributions.
I am not sure how to specify the residual degrees of freedom. Is there a way to understand how many I should specify?
Cate is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.