I am running cox PH survival analyses for diabetes as an outcome, with <10 variables and no interaction terms. The c stats are all consistent with the literature sitting at around 0.75-0.80, as expected. However, I am struggling to print calibration curves, as although the per decile points appear to be in essentially a diagonal line, the scale of the axes aren’t correct as the x=y is almost completely vertical. Here is the code that I have used:
# Function to fit Cox model, predict risk, calculate AUROC, and C-statistic
fit_cox_model <- function(train_data, test_data, event_var) {
cox_model <- coxph(Surv(time_to_event, get(event_var)) ~ age_at_reference + smoking_category +
waist_mm + hypertension_diag + glucose_log + fam_hist_binary,
data = train_data)
print(summary(cox_model))
test_data$pred_risk <- predict(cox_model, newdata = test_data, type = "risk")
# Calculate AUROC
roc_curve <- roc(test_data[[event_var]], test_data$pred_risk)
auroc <- auc(roc_curve)
ci_auroc <- ci.auc(roc_curve)
# Calculate C-statistic and its standard error
c_stat <- summary(cox_model)$concordance[1]
c_stat_se <- summary(cox_model)$concordance[2]
c_stat_ci_lower <- c_stat - 1.96 * c_stat_se
c_stat_ci_upper <- c_stat + 1.96 * c_stat_se
return(list(auroc = auroc, lower_ci_auroc = ci_auroc[1], upper_ci_auroc = ci_auroc[3],
c_stat = c_stat, c_stat_ci_lower = c_stat_ci_lower, c_stat_ci_upper = c_stat_ci_upper,
test_data = test_data))
}
Then for the calibration plots:
# Function to plot calibration curves per decile
plot_calibration_curve <- function(test_data, censoring_var, title_suffix) {
test_data <- test_data %>%
mutate(decile = ntile(pred_risk, 10)) %>%
group_by(decile) %>%
summarise(pred_risk_mean = mean(pred_risk), events = sum(get(censoring_var)), count = n()) %>%
mutate(observed_prob = events / count)
ggplot(test_data, aes(x = pred_risk_mean, y = observed_prob)) +
geom_point() +
geom_line() +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Calibration Curve -", title_suffix),
x = "Mean Predicted Risk",
y = "Observed Probability") +
theme_minimal()
}
As can be seen in the attached image, the mean predicted risks are all above one, it appears to be well calibrated the scale is just off. Should I be using cph()? I have tried to do so but am struggling to get it to run, datadist() is causing issues. Any guidance would be appreciated! Thanks.
I have tried to use cph() instead of coxph() as it has an inbuilt calibration curve function, but am struggling to get it to run, datadist() is causing issues.
Isobel Howard is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1