I came across a problem on r-bloggers that I thought I’d try out as a fun project. However, I’m seeing that the loop used in the purrr::map()
function is a bottleneck in my code so I was wondering if anyone has any ideas on how I can vectorize the calculation of scores in the below code and avoid the loop?
The gist of the problem is that Alice and Bob gets points based on pairs in a sequence of coin tosses. Alice gets a point if there are two sequential heads while Bob gets a point if heads is followed by tails. So in a sequence of (H, H, T, H, H), Alice gets 2 points while Bob gets 1 point.
I’m using these libs other than base
library(dplyr) # bind_rows
library(purrr) # map
I’m using this function to create one game
fn_is_heads <- function(count, sample, sample_size){
ret <- sample(sample, size = sample_size, replace = TRUE, prob = c(0.5, 0.5))
return(ret)
}
Then I’m using this to create a list of games (in this case 100,000 games of 100 tosses)
trials <- 1e5
flip_nbr <- 100
flips <- map(1:trials, .f = fn_is_heads, sample = c(TRUE, FALSE), sample_size = flip_nbr)
I wrote this function to calculate the scores per game
fn_score_vec <- function(flips){
# alice gets points -> TRUE, TRUE
# bob gets points -> TRUE, FALSE
alice_pts <- sum(flips[-length(flips)] == TRUE & flips[-1] == TRUE)
bob_pts <- sum(flips[-length(flips)] == TRUE & flips[-1] == FALSE)
return(data.frame(alice = alice_pts, bob = bob_pts))
}
..and I use purrr::map() to iterate over the list of games and return the scores per game
score <- bind_rows(map(.x = flips, .f = fn_score_vec))
Calculating the scores takes about 11s on my pc. Does anyone have any suggestions on how to get rid of purrr::map()
and speed up the calculation of the scores (other than calculating in parallel)?
Eirik Nærby is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.