I have scoured many similar questions and tried to address any elements that could be the issue, but the output never changes.
I have a dataset (claims) with 607,697 elements and
> names(claims)
[1] "Counts" "exposure" "distance" "weight" "age" "carage" "state" "gender"
where “state” and “gender” are factor variables.
Here is a dataframe with the first 6 observations just as an example:
claims <- data.frame(
Counts = c(0, 0, 0, 1, 0, 0),
exposure = c(0.9935712, 0.8281415, 0.8833985, 0.9648020, 0.9364159, 0.8541331),
distance = c(26, 13, 3, 2, 3, 2),
weight = c(1066, 2386, 1308, 2127, 1370, 883),
age = c(52, 43, 56, 24, 49, 30),
carage = c(5, 2, 8, 4, 9, 3),
state = c("QLD", "NSW", "SA", "ACT", "VIC", "ACT"),
gender = c("male", "female", "male", "male", "male", "male")
)
I have fit a Poisson GLM:
Model1 <- glm(Counts ~ weight+distance+age+carage+gender,
data=claims, family=poisson(), offset=log(exposure))
excluding the state variable. I wish to use this model to produce a prediction for lambda for some given data:
data1 <- data.frame(weight=2000, distance=15, age=30, carage=4,
gender=factor("female", levels=levels(claims$gender)))
lam_bench <- predict(Model1, newdata=data1, type="response")
I have tried/checked many things – changing the “gender” variable, using predict/predict.glm, making sure names align, etc. but no matter what I do, the output of lam_bench is a numeric of 607,697 elements (the number of observations in my claims dataset). These values also don’t match the fitted values provided from the model, so I am not sure what they are.
Any help would be greatly appreciated, and I wouldn’t be surprised if there is something very simple I have overlooked, but I just cannot figure out the problem.
JoelD77 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.