Here is the link to the sample data (the sample data is not large – only 23 kb, but may be particular that leads to the error):
https://drive.google.com/file/d/1TWkFIKhq9VZkFnhUrt6LxYmab54ouODd/view?usp=sharing
Here are my codes for running firth’s model. I got different errors in different runs (restarted r or r session), sometimes the program just seem to be stuck (however, the activity monitor shows cpu usage 99%), other times I got error such as non-convergence and suggested me to increase iteration which does not really help.
library(caret)
library(logistf)
library(data.table)
# Define training control
train_control <- trainControl(method = "repeatedcv",
number = 3, repeats = 3,
savePredictions = TRUE,
classProbs = TRUE)
# Define the custom model function
firth_model <- list(
type = "Classification",
library = "logistf",
loop = NULL,
parameters = data.frame(parameter = c("none"), class = c("character"), label = c("none")),
grid = function(x, y, len = NULL, search = "grid") {
data.frame(none = "none")
},
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
data <- as.data.frame(x)
data$group <- y
logistf(group ~ ., data = data, control = logistf.control(maxit = 100), ...)
},
predict = function(modelFit, newdata, submodels = NULL) {
as.factor(ifelse(predict(modelFit, newdata, type = "response") > 0.5, "AD", "control"))
},
prob = function(modelFit, newdata, submodels = NULL) {
preds <- predict(modelFit, newdata, type = "response")
data.frame(control = 1 - preds, AD = preds)
}
)
train_proc <- fread("train_proc.csv")
# Training the model
set.seed(123)
firth.logist.model <- train(train_proc[, .SD, .SDcols = !c("group")],
train_proc$group,
method = firth_model,
trControl = train_control)
print(firth.logist.model)
here is the most recent error
Warning in logistf(group ~ ., data = data, control = logistf.control(maxit = 100), :
Nonconverged PL confidence limits: maximum number of iterations for variables: (Intercept), x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24 exceeded. Try to increase the number of iterations by passing 'logistpl.control(maxit=...)' to parameter plcontrol
The same code seem to run on some dataset, but not others. But it could also be due to my function is not customizable to the particular dataset. I’ve got many different types of error and I start to wonder if the logistf
package itself is not stable.
To provide more info, here is my r version:
R.version
_
platform aarch64-apple-darwin20
arch aarch64
os darwin20
system aarch64, darwin20
status
major 4
minor 3.2
year 2023
month 10
day 31
svn rev 85441
language R
version.string R version 4.3.2 (2023-10-31)
nickname Eye Holes
Here is my package version:
> packageVersion("caret")
[1] ‘6.0.94’
> packageVersion("logistf")
[1] ‘1.26.0’
> packageVersion("data.table")
[1] ‘1.14.10’