i have something very unexpected happening with the R optimize
function. I have 2 very similar datasets, one on which optimize
works as expected, the other it doesn’t.
Here’s the function i’m trying to minimise and the 2 datasets:
my_beta_fc <- function(data, par){
# data is a dataframe of the ndays distribution with 2 columns:
# x = number of days ranging from 0 to x+
# f_x = reach (absolute)
a <- par
data <- data[,1:2]
names(data) <- c("x", "f_x")
n <- max(data$x)
p0 <- data %>%
mutate(reach_pc = f_x / sum(f_x)) %>%
filter(x == 0) %>%
pull(reach_pc)
xbar <- data %>%
summarise(mean = sum(f_x*x)/sum(f_x)) %>%
pull(mean)
b <- a*(n-xbar)/xbar
# we want to minimise the difference between observed and calculated p0:
result <- abs(p0 - (gamma(n+b)*gamma(a+b))/(gamma(a+n+b)*gamma(b)))
# if the evaluated function returns NaN, replace with a high penalty
# to steer the optimization away from regions where the function returns NaN:
if(is.nan(result)|is.na(result)){
return(1e10)
}
return(result)
}
t1 <- data.frame(
n_days = 0:7,
reach = c(40979971, 2110778, 1126387, 729457, 541512, 346607, 236263, 198262)
)
t2 <- data.frame(
n_days = 0:7,
reach = c(41233610, 2017354, 1063684, 694576, 518144, 330215, 223006, 188648)
)
then the result for t1 is:
param_solution <- optimize(
f = function(param) my_beta_fc(data = t1, param),
interval = c(0,10),
tol = 0.00001
)
gives:
> param_solution
$minimum
[1] 0.05495449
$objective
[1] 0.0000003038929
but running the same for t2:
param_solution <- optimize(
f = function(param) my_beta_fc(data = t2, param),
interval = c(0,10),
tol = 0.00001
)
gives:
> param_solution
$minimum
[1] 9.999995
$objective
[1] 10000000000
which is evidently not the solution as we can get a better value of the objective function using the solution found with t1…
Any idea what might be the issue here?