I am facing issues due to low memory caused by the use of too many columns. I have 900 data frames (df) each with 2 million rows. Each df contains values for one individual. I tried to merge all the data frames together and then transpose the resulting data frame as follows:
groupImages <- list.files('quantification/') # Listing all the individual files
bindDF <- read.csv(paste0('quantification/', groupImages[1])) # Loading the first for creating the dataframe
for (i in 2:length(groupImages)) {
subDF <- read.csv(paste0('quantification/', groupImages[i]))
bindDF <- cbind(bindDF, subDF)
}
bindDF.transp <- as.data.frame(t(bindDF[,-1]))
Now, I would like to merge this data frame with another containing information about the group of each individual. My overall aim is to calculate:
- The effect size of differences across groups
- The variance of the measure in each column
However, I am unable to proceed because my RAM (~100 GB) is running out. Can I improve my code with regard to handling the data frames, or is it impossible to work with 2 million columns?
One solution would be to reduce the number of columns by excluding those where more than 10 participants (rows) have values < 0.05 or NA. This approach could potentially reduce the columns by 25-50%, but I am concerned about losing track of the overall structure/order of the columns. It is important that, in the end, I have one row for overall variance and one of effect size, both of which should be the same length as the original data frames (2 million cols), as I want to recreate a 3D matrix that will summarize my data for all participants.
Would this work? And how do I implement it?
Many thanks
K
Kostas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2