I am looking for an algorithm and its implementation in R for sample selection. I have a data.frame with i objects, and each object has j unique features. In parallel, I have > 100 samples k that can have three different values (0, 1, 2) for each feature.
My goal is to determine a minimum number of samples so that I have at least each value three times for each object (not for each feature). Can you help me?
I have created a data.frame as an example that can easily be scaled down:
set.seed(1234)
i = 250 #objects
j = 12 #features
k = 105 #samples
values = c(0, 1, 2)
dat <- data.frame("object" = rep(1:i, each = j)) |> mutate(features = paste(object, 1:j, sep = "_"))
dat <- cbind(dat, sapply(X = paste0("sample", 1:k), FUN = function(x) {sample(x = values, size = nrow(dat), replace = TRUE)}, simplify = FALSE, USE.NAMES = TRUE))
Thank you!