I have a presence/absence data for a species occurence and I did a binomial GAM using sea surface temperature (sst) as a predictor variable.
My df object is a data.frame with the presence/ansence data for the species (0/1), the date and the coordinates of each point (WGS84) and the correspondent sst extracted from satellite observations.
gam<-mgcv::gam(sfr_presence ~ s(sst), family=binomial, data=df, method="REML")
Family: binomial
Link function: logit
Formula:
sfr_presence ~ s(sst)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.17801 0.05595 -56.8 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(sst) 5.853 6.862 553 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0385 Deviance explained = 9.54%
-REML = 5886.8 Scale est. = 1 n = 25297
plot(gam,trans=plogis)
sst fitted function with plogis transformation
I’m trying to predict the probability of occurence of this species for a new dataset, but the values that result from the predict function are very low compared with the scale of the fitted function of sst from the gam.
The new dataset includes sst data from the same source and for the same area considered for model fitting.
pred<-mgcv::predict.gam(gam,se.fit=T,newdata=df_test, type="response",backtransform=T)
summary(pred$fit)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.02190 0.09596 0.11609 0.10250 0.11829 0.12178 21
summary(df_test$sst)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
19.60 21.24 21.95 21.99 22.72 24.43 21
If the average of sst in the new dataset is around 22ºC the average prediction shouldn´t be around 0.7?
Maria Inês Silva is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.