I’m trying to create new dataframes with the top three values for each column across a dataframe.
probUnweighted <- data.frame(
Sample1 = c(0.9, 0.2, 0.03, 0.1, 0.5, 0.09),
Sample2 = c(0.045, 0.11, 0.006, 0.0036, 0.005, 0.025),
Sample3 = c(0.05, 0.21, 0.06, 0.0067, 0.0105, 0.1025)
)
I’ve written a for loop to create new dataframes for each column/sample
library(dplyr); library(tibble); library(tidyr)
for (i in names(probUnweighted)){
assign(paste0("df_",i), probUnweighted %>%
select(i) %>%
slice_max(order_by = i, n = 3)
)
}
If I stop at select(i)
I generate three dataframes as expected, df_Sample1 and so on. However, the slice_max keeps giving me an error,
Error in `slice_max()`:
! Can't compute indices.
Caused by error:
! `order_by` must have size 6, not size 1.
3
Does this give you what you want?
lapply(
names(probUnweighted),
function(x) {
probUnweighted %>%
select(all_of(x)) %>%
slice_max(order_by = .data[[x]], n = 3)
}
)
[[1]]
Sample1
1 0.9
2 0.5
3 0.2
[[2]]
Sample2
1 0.110
2 0.045
3 0.025
[[3]]
Sample3
1 0.2100
2 0.1025
3 0.0600
Though I agree with LMc’s implicit assumption that your data would be tidier if you pivoted into long format.
If you pivot your data to a long format then you can do this using a by
group:
library(dplyr)
library(tidyr)
pivot_longer(probUnweighted, everything()) |>
slice_max(value, n = 3, by = name) |>
pivot_wider(values_fn = list) |>
unnest(everything())
# Sample1 Sample2 Sample3
# <dbl> <dbl> <dbl>
# 1 0.9 0.11 0.21
# 2 0.5 0.045 0.102
# 3 0.2 0.025 0.06
probUnweighted %>%
reframe(across(everything(), ~head(sort(.x, TRUE), 3)))
Sample1 Sample2 Sample3
1 0.9 0.110 0.2100
2 0.5 0.045 0.1025
3 0.2 0.025 0.0600