I am trying to fit SIRD model in R to real data. However, the observed values are lying nowhere on the fitted curve. I can’t understand what the error is or how to resolve it. My data is the Mexican COVID data from the Our World in Data dataset
https://ourworldindata.org/coronavirus -> “Download dataset”
My code works as expected up until the final lines where the optimisation function is not able to fit my model to the smoothed death data. My thought is that this is a fundamental problem with the model I’ve used so I’ve attached my entire code. I am using ‘Total Deaths (smoothed)’ to calibrate my model because ‘total cases’ does not update when an individual recovers or dies.
This is my code:
OWID <- read.csv("owid_covid_data.csv")
MEX <- OWID[65675:66223 , 1:62]
DEATHS <- OWID[65675:65875 , 8]
install.packages("deSolve")
library("deSolve")
sird_equations <- function(time, variables, parameters) {
with(as.list(c(variables, parameters)), {
dS <- - (beta * (1 - mu) * S * I ) / (S + I + R + D)
dI <- (beta * (1 - mu) * S * I) / (S + I + R + D) - sigma * I - theta * I
dR <- sigma * I
dD <- theta * I
return(list(c(dS, dI, dR, dD)))
})
}
parameter_values <- c(
beta = 0.4369472, #infectious contact rate (/person/day)
sigma = 0.083991, #recovery rate (/day)
theta = 0.00002815222, #mortality rate (/day)
mu = 0.0511453 #adjustment for efficacy of countermeasures
)
inital_values <- c(
S = 126000000, #suseptible population
I = 1, #infected on day 1 of data
R = 0, #recovered on day 1
D = 0 #dead on day 1
)
time_values <- seq(0, 200, 1)
sird_values_1 <- ode(
y = inital_values,
times = time_values,
func = sird_equations,
parms = parameter_values
)
sird_values_1 <- as.data.frame(sird_values_1)
with(sird_values_1, {
plot(
time, S,
ylim = c(0, max(sird_values_1[-1])),
type = "l", col = "blue", xlab = "time (days)", ylab = "# of people"
)
lines(time, I, col = "black")
lines(time, R, col = "red")
lines(time, D, col = "orange")
})
sird_values_1 <- as.data.frame(sird_values_1)
with(sird_values_1, {
plot(
time, I,
ylim = c(0, max(sird_values_1[-1])),
type = "l", col = "blue", xlab = "time (days)", ylab = "infections"
)
})
sird_1 <- function(beta, sigma, theta, mu, S0, I0, R0, D0, times) {
require(deSolve)
sird_equations <- function(time, variables, parameters) {
with(as.list(c(variables, parameters)), {
dS <- - (beta * (1 - mu) * S * I ) / (S + I + R + D)
dI <- (beta * (1 - mu) * S * I) / (S + I + R + D) - sigma * I - theta * I
dR <- sigma * I
dD <- theta * I
return(list(c(dS, dI, dR, dD)))
})
}
parameter_values <- c(beta = beta, sigma = sigma, theta = theta, mu = mu)
intial_values <- c(S = S0, I = I0, R = R0, D = D0)
out <- ode(intial_values, times, sird_equations, parameter_values)
as.data.frame(out)
}
sird_1(beta = 0.4, sigma = 0.1, theta = 0.05, mu = 0.1, S0 = 126000000, I0 = 1, R0 = 0, D0 = 0, times = seq(0, 540, 0.01))
plot(DEATHS)
predictions <- sird_1(beta = 0.3, sigma = 0.1, theta = 0.000959, mu = 0.1, S0 = 126000000, I0 = 1, R0 = 0, D0 = 0, times = seq(0, 200, 1))
with(predictions, lines(time, D, col="red"))
sum((DEATHS - predictions$I )^2, na.rm = T)
squaresum <- function(beta, sigma, theta, mu, data = DEATHS, N = 126000000) {
I0 <- 1
times <- time_values
predictions <- sird_1(beta = beta, sigma = sigma, theta = theta, mu = mu, S0 = N - I0, I0 = I0, R0 = 0, D0 = 0, times = time_values)
sum((predictions$D[-1] - DEATHS[-1])^2, na.rm = T)
}
squaresum(beta = 0.16, sigma = 0.1155556, theta = 0.00959596, mu = 0)
ss2 <- function(x) {
squaresum(beta = x[1], sigma = x[2], theta = x[3], mu = x[4])
}
ss2(c(0.004, 0.5, 0.2, 0))
starting_param_val <- c(0.5, 0.01, 0.04, 0)
ss_optim <- optim(starting_param_val, ss2)
ss_optim
ss_optim$par
plot(DEATHS)
predictions <- sird_1(beta = 0.4369472, sigma = 0.0839916, theta = 0.00002815222, mu = 0.05114537, S0 = 126000000, I0 = 1, R0 = 0, D0 = 0, times = seq(0, 200, 1))
with(predictions, lines(time, D, col="red"))
I was expecting a final plot where the predictions
line closely mapped onto the inital death data, my plan was to then extrapolate these parameter values to the entire dataset and discuss where and why the model varies from the real data. However, the parameter values my script estimates lie way off the real data despite supposedly being at the minimum squared distance. **Is there any way I could improve my code so that it estimates more accurate parameter values or is the error more fundamental?
**
My only thought is that perhaps it’s a flawed assumption to label the entire population of Mexico Susceptible
early in the pandemic and that’s why my model cannot fit the data? I would really appreciate support in understanding the issue(s) with my code.