longtime reader first time inquirer here, so apologies if this question is worded or formatted poorly. TLDR; When I try to run a for loop on a column inside dataframes inside a list, I get errors. I have attempted a few different options (below), none of which have worked. I have tried to include a reproducible example:
# make 3 data frames, each with 3 columns
ex1 <- data.frame(id = c('K', 'Q', 'E', 'A'),
adj.P.Val = c(0.01, 0.06, 0.04, 0.03),
logFC = c(1.3, 1.6, 1.2, 1.7))
ex2 <- data.frame(id = c('K', 'Q', 'E', 'A'),
adj.P.Val = c(0.02, 0.04, 0.06, 0.08),
logFC = c(1.6, 1.2, 0.8, 1.9))
ex3 <- data.frame(id = c('K', 'Q', 'E', 'A'),
adj.P.Val = c(0.03, 0.01, 0.09, 0.04),
logFC = c(0.9, 1.6, 1.7, 1.0))
In reality, I have 10 dataframes to work on, each with tens of thousands of rows of data, so my strategy has been to create a list with all of the dfs. I need to do several things to these dataframes. So far I have been successful in sorting the dfs within the list alphabetically by the ‘id’ column:
# put example dfs into a list
exlist <- list(ex1, ex2, ex3)
# loop to arrange dfs by ID alphabetically
for (i in 1:length(exlist)){
exlist[[i]] <- arrange(exlist[[i]], id)
}
If I understand the result of this code correctly, I am actually only modifying the elements within the list, not the data frames in the global environment. This is fine as it means I won’t have to reload the original csv files if I make a mistake (I know I can just make copies of everything which you will see I have been doing later).
Now onto my issue. I need to look at the ‘adj.P.Value’ column. Wherever adj.P.Value > 0.05, I need to make adj.P.Value and logFC of that row = 0. I tested this on a dataframe outside the list, and this code works:
#proof of concept, how to change all values > x in column y to == z
testex <- ex1
testex$adj.P.Val[testex$adj.P.Val > 0.05] <- 0
testex$logFC[testex$adj.P.Val == 0] <- 0
testex # you can see the adj.P.Val and logFC values for id = Q are now 0
However, when I try a for loop to run this on the column in the dataframes inside the list, I get errors. I have attempted a few different options, none of which have worked, which leads me to believe my error lies in the fundamentals of trying to edit columns in dfs in a list. Before I abandon the list and loop, I want to ask here if looping through columns in a list is even possible. And yes, I have absolutely spent 30x more time trying to figure out how to automate this than it would have taken me to just write the code for the 10 dfs individually but I am stubborn lol. Here are some things I tried and the errors I got when running them. I only included turning adj.P.Value > 0.05 <- 0, but I also need the logFC[adj.P.Value = 0] <- 0 part.
for (i in 1:length(exlist)) {
exlist[[i]] <- i$adj.P.Val[i$adj.P.Val > 0.05] <- 0
}
# Error in `*tmp*`$adj.P.Val : $ operator is invalid for atomic vectors
# attempt to solve above error with proper syntax for calling columns in atomic vectors:
for (i in 1:length(exlist)) {
exlist[[i]] <- i['adj.P.Val'][i['adj.P.Val'] > 0.05] <- 0
}
# Error in exlist[[i]] <- i["adj.P.Val"][i["adj.P.Val"] > 0.05] <- 0 : [[ ]] subscript out of bounds
# I won't clog this up further by putting every single iteration of the above with different syntax that I tried, but I tried [[]], [""], [[""]], etc.
# Trying pluck()
for (i in 1:length(exlist)) {
exlist[[i]] <- pluck(exlist, i, "adj.P.Val")[pluck(exlist, i, "adj.P.Val") > 0.05] <- 0
}
# No error, it just makes everything (like each entire df object) in the list have a value of 0
# Maybe the loop is the problem? Let's try lapply
lapply(testex, function(testex) testex[['adj.P.Val']][tesex[['adj.P.Val']] > 0.05] <- 0 )
# Error in `*tmp*`[["adj.P.Val"]] : subscript out of bounds
# Also tried lapply with df$column syntax, got the same error for $ as above
Is there a way to edit columns inside dfs inside a list or should I just do it manually for each individual data frame? Also, in the future I will need to extract the data frames from the list, though I am sure the answer to that is much easier to find. My other option, if this is an unsolvable problem, would be to figure out how I can make a function that does everything I need (alphabetize, turn large P values to 0, rename columns, etc) and just run each individual data frame through that, so it is still somewhat automated. Obviously you don’t have to answer all these extra thoughts, just sharing my line of thinking. I appreciate any and all help! Thanks 🙂
Bee is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.