I would like to know what would be the best method to do 10x repeated 10-fold cross validation for an elastic-net penalized cox model in R. I am currently using the package glmnet, tuning the alpha first and second the lambda. I would like to use the C-index as a performance metric to choose the best lambda, but I have not been able to find where it is stored in my glmnet object. I was thinking to make a loop and set a seed inside the loop to repeat the CV 10 times, but without the C-index for the different lambda values, I am not sure how to choose the best lambda across models. I get visualize the C-index for the different lambda values using plot(glmnetfit), but I would have to visually inspect the plots for each repeats, which does not sound very accurate to me and becomes difficult if there are more than 10 repeats.
Here is the code I have been using:
library(glmnet)
library(survival)
data("CoxExample")
x <- CoxExample$x
y <- CoxExample$y
alphas <- seq(0, 1, by = 0.1)
# Fit models with different alpha values
models <- lapply(alphas, function(alpha) {
glmnet(x = x, y = y, family = "cox", alpha = alpha)
})
# Optimize alpha
set.seed(123)
cv_results <- lapply(seq_along(models), function(i) {
alpha <- alphas[[i]]
print(alpha)
lambda <- models[[i]]$lambda
if (is.null(alpha)) {
alpha <- 1 # Default to lasso if alpha is missing
}
cv.glmnet(x = x, y = y, family = "cox",
alpha = alpha, lambda = lambda)
})
optimal_alpha_index <- sapply(cv_results, function(cv_result) which.min(cv_result$cvm))
optimal_cvm <- vector("list", length = length(cv_results))
for (i in seq_along(cv_results)) {
optimal_index <- which.min(cv_results[[i]]$cvm) # Get the index of the optimal model for alpha i
optimal_cvm[[i]] <- cv_results[[i]]$cvm[optimal_index] # Extract the corresponding cross-validation error
}
optimal_alpha = alphas[which.min(optimal_cvm)]
optimal_alpha #1
#Optimize lambda
models <- list()
for (i in 1:10) {
set.seed(i*123)
models[[i]] <- cv.glmnet(x=x,
y = y,
family = "cox",
alpha = optimal_alpha,
type.measure = "C"
)
print(models[[i]]$lambda.min)
}
optimal_lambda_index <- sapply(models, function(models) which.min(models$cvm))
I end up with the best lambda for each of the 10 repeats. I would like to extract the C-index for the best lambda in each model, or if not possible hear what would be the best approach to CV repeats. Any help would be appreciated.
Thanks!
user24675155 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.