I am doing a logistic regression to assess the accuracy of the classification of the variable ‘fasting_status’ (0=non-fasted, 1=fasted) based on three numeric variables (a1c, glu, and uc_ratio).
My model accuracy keeps printing as 0, even though I do not think that is true (see below)- and I’m wondering where (if) I’m going wrong.
I’m pasting my code and data below. Any help is appreciated. Thank you!
> set.seed(123)
> splits <- createDataPartition(1:nrow(a1c.logistic), p = 0.6, list = FALSE)
> train.data <- a1c.logistic[splits[,1],]
> test.data <- a1c.logistic[setdiff(1:nrow(a1c.logistic), splits[,1]),]
> #fit the logistic regression model
> model <- glm( fasting_status ~., data = train.data, family = binomial)
> summary(model)
Call:
glm(formula = fasting_status ~ ., family = binomial, data = train.data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -14.47639 4.65923 -3.107 0.00189 **
a1c 1.85038 0.76265 2.426 0.01526 *
glu 0.05356 0.02251 2.379 0.01736 *
uc_ratio -0.02304 0.01500 -1.537 0.12439
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 66.459 on 47 degrees of freedom
Residual deviance: 51.781 on 44 degrees of freedom
AIC: 59.781
Number of Fisher Scoring iterations: 4
> #make predictions
> probabilities <- model %>% predict(test.data, type = "response")
> predicted.classes <- ifelse(probabilities > 0.5, "pos", "neg")
> predicted.classes
2 3 4 9 11 13 14 19 23 29 34 36 37 38 39 41 45 46
"neg" "neg" "neg" "neg" "pos" "pos" "neg" "neg" "neg" "neg" "neg" "neg" "neg" "neg" "pos" "neg" "pos" "pos"
48 53 54 55 59 65 66 67 68 70 71
"pos" "pos" "pos" "pos" "pos" "pos" "pos" "pos" "pos" "pos" "neg"
> ## here, I can see that some of the negs and pos's align with the 0s and 1s in my original dataset- so my mean accuracy shouldnt be zero
> #check the dummy coding
> contrasts(test.data$fasting_status)
1
0 0
1 1
> # Model accuracy
> mean(predicted.classes == test.data$fasting_status)
[1] 0
> # mean accuracy is zero, which seems wrong?```
``` Data:> dput(a1c.logistic)
structure(list(fasting_status = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), levels = c("0", "1"), class = "factor"),
a1c = c(4.3, 4.5, 4.4, 2.9, 4.3, 4.4, 4.2, 4.5, 4.2, 4.2,
4.5, 4.5, 4.8, 4.5, 5.2, 4.9, 4.6, 4.2, 4.4, 4.9, 4.6, 4.5,
4.4, 4.8, 4.5, 4.1, 3.8, 3.1, 4.3, 4.6, 4.7, 4.9, 4.6, 4.4,
3.1, 4.6, 4.4, 4.2, 4.4, 5.2, 4.4, 5.1, 4.6, 4.7, 5.2, 4.7,
4.7, 4.6, 4.4, 4.4, 4.2, 4.5, 4.6, 4.4, 3.2, 4.8, 5.2, 5.2,
4.6, 4.9, 5.6, 4.6, 4.9, 4.5, 5.1, 4.6, 4.9, 4.6, 4.3, 4.6,
4.6, 4.3, 4.6, 4.3, 4.6, 6.5, 4.8), glu = c(88.5, 98, 117.5,
53, 108.5, 106, 105, 101, 91, 99.5, 128.5, 113, 114, 121.5,
121, 131.5, 160.5, 96, 110, 140, 119.5, 115.3, 112, 143.5,
116.5, 116.5, 111, 139.5, 123.5, 131, 113, 137, 114, 98.5,
124.5, 123.5, 111.5, 111, 127, 123, 137.5, 119, 107, 130.5,
142.5, 115, 133.5, 119, 148.3, 125.5, 138.5, 106.5, 153.5,
126.5, 179, 145, 143, 124.5, 134, 146.5, 127.5, 124.5, 123,
129, 145.3, 125.5, 146.5, 153.5, 115.5, 128, 110.5, 131,
139.5, 124, 154, 94, 76.3), uc_ratio = c(30.65603924, 15.32801962,
60.59075991, 7.39973361, 57.84661317, 27.46781116, 16.0944206,
6.131207848, 94.61568474, 19.50838861, 7.803355443, 19.41549152,
7.464079119, 19.67095851, 29.50643777, 62.94706724, 80.472103,
25.75107296, 73.57449418, 39.01677721, 41.13018598, 10.62933697,
7.803355443, 30.04291845, 32.75355771, 49.52129416, 5.969860273,
22.72153497, 7.153075823, 75.61823012, 23.50296342, 53.64806867,
11.19611891, 38.25340549, 88.36152487, 51.50214592, 9.196811772,
41.98544505, 6.35828962, 9.196811772, 94.87237407, 12.87553648,
6.035407725, 7.39973361, 10.72961373, 11.70503316, 9.035464197,
16.34988759, 11.68917269, 35.11509949, 61.85306741, 11.36076748,
12.2624157, 7.153075823, 14.30615165, 10.40447392, 3.901677721,
52.11526671, 21.45922747, 30.49469166, 81.06819266, 1.950838861,
34.33476395, 8.0472103, 24.94635193, 9.754194304, 64.3776824,
9.196811772, 11.92179304, 34.87124464, 74.39198856, 124.4635193,
13.79521766, 5.722460658, 66.76204101, 69.9757432, 19.50838861
)), row.names = c(NA, -77L), class = "data.frame")```
Edit: I wonder if the problem is that my variable 'fasting_status' should be 0's and 1's, but in my dput, it is showing as 1L and 2L? (I used to have the variable coded as 0 and 1 but I changed it for the purpose of this- I've tried clearing the environment and everything..)