i am working on a project, where we are trying to format an table in R from wide to long format.
Generally the original table is formatted like this:
ID;Time_t;varX_t;varY_t;varZ_t,varU_t;Time_r;varX_r;varY_r;varZ_r,varU_r; (and so on)
all the values are numeric.
We would like to have the format resemble this:
ID;Time;varX; varY; varZ; varU
1 ;t ;varX[t;1];varY[t;1];varZ[t;1];varU[t;1]
1 ;r ;varX[r;1];varY[r;1];varZ[r;1];varU[r;1]
[...]
varX[t;1] is representing a specific value from variable x at the time of t, variable 1 etc.
Here is the Code:
Source: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html
library(reshape2) # mainly using reshape2, another option would be the library "data.frame"
Data = read.spss("Path")
Data_DF <- as.data.frame(Data) # Reformatting
id_vars <- c("ID",[...]) # defining all the varibales, staying the same (in total: 21)`
# Defining all the groupes of varibles, which should be reformatted as one each
measure.vars_Time <- c("t","r",[...]) # 13 variables
measure.vars_varX <- c("varX_t","varX_r",[...]) # 13 variables
measure.vars_varY <- c("varY_t","varY_r",[...]) # 13 variables
measure.vars_varZ <- c("varZ_t","varZ_r",[...]) # 13 variables
measure.vars_varU <- c("varU_t","varU_r",[...]) # 13 variables
# melting each dataset individually --> could be an error source
melted_time <- melt(Data_DF, id.vars = id_vars, measure.vars = measure.vars_Time, variable.name = "time", value.name = "time_value")
melted_varX <- melt(Data_DF, id.vars = id_vars, measure.vars = measure.vars_varX, variable.name = "varX", value.name = "varX_value")
melted_varY <- melt(Data_DF, id.vars = id_vars, measure.vars = measure.vars_varY, variable.name = "varY", value.name = "varY_value")
melted_varZ <- melt(Data_DF, id.vars = id_vars, measure.vars = measure.vars_varZ, variable.name = "varZ", value.name = "varZ_value")
melted_varU <- melt(Data_DF, id.vars = id_vars, measure.vars = measure.vars_varU, variable.name = "varU", value.name = "varU_value")`
# the indiviudal data sets are looking like this, which is *not* the wanted format:
# ID;Time;varX; varX_value
# 1 ;t ;varX_t;varX[t;1]
# 2 ;t ;varX_t;varX[t;2]
#[...]
# 1 ;r ;varX_r;varX[r;1]
#[...]
# combining all indiviudal datasets
wide_5Var <- Reduce(function(x, y) merge(x, y, all=TRUE), list(melted_time, melted_varX, melted_varY, melted_varZ, melted_varU))
# This gives a table, which has over 1,800,000 entries (which is a little to much lol). It seems like the Code "multiplicates" causing so many entries.
# It fails to synchronize to the specific timestamps, so a row can look like this:
#ID; Time; Time_value; varX; varX_value; varY; varY_value; varZ; varZ_value; varU ; varU_value
# 1; t ; t ; varX_t; varX[t;1] ; varY_t; varY[t;1] ; varZ_r; varZ[r;1] ; varU_t; varU[t;1]
# Note: Now the Structure prioritizes the ID, putting all possible possiblities from on ID, before continuing with the next.
# We also tried other Methodes using the data.frame package, but this is the closest we got.`
New contributor
Knobert is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.