I have a data frame, with each column containing data on group membership. For each column, I would like to calculate the frequency of each group in that column as well as the percentage of each group out of the total frequency for that particular column.
I would like the results (frequency table) to be stored in a list, with each element of the list being the result (frequency by group, percentage by group) for a particular column.
Here is what the data frame look like:
df <- data.frame("gender" = c("male", "female", "", "male", "female"),
"score1" = c(1, 1, NA, NA, 3),
"score2" = c(NA, NA, 3, 4, 5),
"score3" = c(2, 2, 3, 3, NA))
I have attempted the following code, which works on only 1 variable/column at a time.
countsbyvars <- function(var) {
df %>%
select({{var}}) %>%
drop_na() %>%
group_by({{var}}) %>%
summarise(n = n()) %>%
mutate(freq = paste0(n / sum(n) * 100, "%"))
}
I would like to loop the column names of the data frame as input into the above user-generated function and store the result into a list.
I have attempted the following 3 pieces of code that I have written, but it does not working
# Using a for loop
res <- list()
for (i in names(df)) {
res[i] <- countsbyvars(i)
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `i` is not found.
}
# Using lapply to loop
lapply(df, countsbyvars(names(df)))
Error in `group_by()`:
ℹ In argument: `names(df)`.
Caused by error:
! `names(df)` must be size 0 or 1, not 317.
# Not sure how to use the across() function without hardcoding the column name
The end result would be something like this:
[[1]]
male | 2 | 50%
female | 2 | 50%
[[2]]
1 | 2 | 66.666666%
3 | 1 | 33.333333%
[[3]]
3 | 1 |33.33333%
4 | 1 | 33.33333%
5 | 1 | 33.33333%
[[4]]
2 | 2 | 50%
3 | 2 | 50%