I am building a simple linear regression model to be used in neuropsychology offices that takes patient data from cognitive testing, compares it to past patient data/diagnosis, and predicts a statistical analysis of what disease a patient has (Parkinson’s or Alzheimer’s). The issue is that when I go to use the “train” function, the model can’t compute it because every patient has multiple “NA” values from not taking that specific test. I need the model to compute it anyway because we can’t have every patient take every test, but it keeps defaulting to the “accuracy” metric, which cannot be calculated since every patient has missing values. I also can’t input a constant to replace any missing values because doing so would warp the cognitive diagnosis, as it is possible for a patient to legitimately score a 0 on a test, indicating a severe memory deficiency. I have tried multiple custom metrics to work around the default “accuracy” metric, but it won’t work. I’ve tried:
model <- train(
x = combined_data[, c("age", "hvlt_immed_total", "hvlt_delayed_recall",
"hvlt_retention", "hvlt_recogn", "cvlt_tot",
"cvlt_ldfr", "cvlt_recog", "wms_iv_lm1",
"wms_iv_lm2", "wms_iv_lm_rec", "bvmt_immed_total",
"bvmt_delayed_recall", "bvmt_retention",
"bvmt_recogn", "rcf_copy", "rcf_delayed",
"rcf_recogn")],
y = combined_data$Disease,
method = "glm",
family = "binomial",
trControl = ctrl,
na.action = "na.pass"
)
as well as
model <- train(
x = combined_data[, c("age", "hvlt_immed_total", "hvlt_delayed_recall",
"hvlt_retention", "hvlt_recogn", "cvlt_tot",
"cvlt_ldfr", "cvlt_recog", "wms_iv_lm1",
"wms_iv_lm2", "wms_iv_lm_rec", "bvmt_immed_total",
"bvmt_delayed_recall", "bvmt_retention",
"bvmt_recogn", "rcf_copy", "rcf_delayed",
"rcf_recogn")],
y = combined_data$Disease,
method = "glm",
family = "binomial")
So I tried to create a custom metric using F1 as an accuracy measure. This worked initially using the code
custom_F1 <- function(data, lev = NULL, model = NULL) {
predictions <- predict(model, data)
actual <- data$Disease
cm <- confusionMatrix(predictions, actual)
precision <- cm$byClass["Pos Pred Value"]
recall <- cm$byClass["Sensitivity"]
f1_score <- ifelse(precision + recall == 0, 0, 2 * (precision * recall) / (precision + recall))
return(f1_score)
}
but then when I went to enter this code (which also worked at first)
model <- train(
x = combined_data[, c("age", "hvlt_immed_total", "hvlt_delayed_recall",
"hvlt_retention", "hvlt_recogn", "cvlt_tot",
"cvlt_ldfr", "cvlt_recog", "wms_iv_lm1",
"wms_iv_lm2", "wms_iv_lm_rec", "bvmt_immed_total",
"bvmt_delayed_recall", "bvmt_retention",
"bvmt_recogn", "rcf_copy", "rcf_delayed",
"rcf_recogn")],
y = combined_data$Disease,
method = "glm",
family = "binomial",
trControl = trainControl(method = "cv", # Cross-validation
summaryFunction = custom_F1, # Use custom F1 score metric
na.action = na.pass # Pass through NA values
)
)
but after I inputted all of that code successfully, and it seemed to be successful, it said that the object “model” doesn’t exist. I don’t know what to do. Please let me know if you can help! Thank you!
Kate Ogden is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.