I have a dataframe samples
which has 50 rows, 10 for each of 5 samples. In this df, the sample no is labelled by a column sample_no
.
I also have a function f
which takes a sample and returns a one-row summary data frame. When I write:
samples %>%
group_by(sample_no) %>%
dplyr::summarise( d = f(.) ) %>%
unpack(cols = d)```
f
seems to see all samples at once (50 rows) rather than one sample at a time (10 rows). By contrast
f(samples %>% filter(sample_no == 1))
Sees 10 rows as I would expect.
Is it the case that only certain special functions can be used inside summarise, and that those functions have to be ‘group-aware’? Or is something else happening?
MWE:
library(tidyverse)
SAMPLE_SIZE <- 10
NSAMPLES <- 5
fresh_sample <- function() {
data.frame(F = rnorm(SAMPLE_SIZE, 0, 1))
}
# I'd also love to know if there is a neater way to do this...
samples <- do.call(rbind, lapply(1:NSAMPLES, function(k) {
fresh_sample() %>% mutate(sample_no = k)
}))
# I don't actually need to compute dimensions.
# The actual computation I want to apply to a sample is much more complex.
dims <- function(sample) {
data.frame(nrow = nrow(sample), ncol = ncol(sample))
}
# This returns a 1-row df, with nrow = 10, ncol = 2
dims(samples %>% filter(sample_no == 1))
# This returns a 5-row df, each with nrow = 50, ncol = 2
samples %>%
group_by(sample_no) %>%
dplyr::summarise(d = dims(.)) %>%
unpack(cols = d)