I am attempting to summarize data using data.table in R. The problem is that within my summarization framework the use of quantile
spits out the results in rows instead of columns. I have a simple enough workaround, but I wanted to see if there is a more direct way to get the output I want.
library(data.table)
library(magrittr)
#create some example data
dt<-data.table(ID=rep(1:3, each=100),
val=rnorm(300,100, sd=20))
#define percentiles of interest
percs=c(0.05, 0.10, 0.50, 0.75, 0.90, 0.95)
The following code provides all of the data I want, but not in the format I want. It puts each percentile on a new row and then replicates the previous summary statistics (mean, median, max min). I want each percentile in a new column.
#Summarize the data - this is the brevity I want but the format puts the percentiles in new rows
dt[,.(mean=mean(val, na.rm=T),
Median=median(val, na.rm=T),
min=min(val, na.rm=T),
max=max(val, na.rm=T),
as.list(quantile(val, probs=percs, na.rm=T))),
by=ID]
The following code gives me the format I want, but now I am doing two operations and gluing the results together. Ultimately this is fine, but I would rather a code structure like the previous snippet with output formatted as below.
#summarize data - this is the format I want it in but don't want the extra operations
dt[,.(mean=mean(val, na.rm=T),
Median=median(val, na.rm=T),
min=min(val, na.rm=T),
max=max(val, na.rm=T)),
by=ID]%>%
cbind(.,dt[,setDT(as.list(quantile(val, probs=percs, na.rm=T))),by=ID][,-1]) #-1 to remove extraneous ID column