I am currently writing an update of an article on vulture population trends (Vulture article and looking at regular feeding at four long-term vulture restaurant sites from July 2016 to December 2023. At each feeding event, we are recorded the total number of vultures feeding for each of the three species (RHV, SBV, WRV). For each species we fit a set of generalized linear models. For the regular restaurant data, we modelled count at each site as a function of either a linear or quadratic effect of date, and fixed effects of site and season, and interactions among them. Season was a two-level factor of the six months with either the highest or lowest average rainfall in our study area, with “wet” from May to October, and “dry” from November to April. At vulture restaurants, for all species, the counts of birds were best modelled date2 × site × season. I would therefore like to represent the data the same way as the article (Figure 3) on a scatterplot: the expected counts for each species, the 95% confidence interval, and the actual restaurant counts at each site are shown in Figure 3, averaged by season. I managed to go the plots by site but I am having a issue with line for the predicted count averaged by season.
Here is a glimpse of my data:
Rows: 734
Columns: 19
$ Site “SPWS”, “CWS”, “CWS”, “SPWS”, “EPL”, “CWS”, “SPWS”, “SPWS”, “CWS”…
$ Date 2016-07-10, 2016-07-12, 2016-07-20, 2016-07-20, 2016-07-23…
$ RHV 7, 7, 9, 11, 2, 7, 4, 4, 8, 3, 8, 3, 13, 2, 4, 0, 0, 2, 13, 2, 11, 4…
$ SBV 28, 3, 5, 37, 0, 4, 24, 25, 2, 0, 4, 39, 3, 0, 28, 0, 32, 23, 3, 0…
$ WRV 63, 17, 26, 35, 3, 24, 45, 38, 19, 0, 16, 64, 18, 1, 62, 0, 78, 46…
$ Total 98, 27, 40, 83, 5, 35, 73, 67, 29, 3, 28, 106, 34, 3, 94, 0, 110, 71…
$ Month 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10…
$ Year 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
$ Season “Wet”, “Wet”, “Wet”, “Wet”, “Wet”, “Wet”, “Wet”, “Wet”, “Wet”, “Wet”…
$ Month_Text “July”, “July”, “July”, “July”, “July”, “August”, “August”, “August”…
$ Date_Num 9, 11, 19, 19, 22, 40, 40, 50, 51, 59, 71, 71, 81, 81, 81, 81, 88…
$ Predicted_CountRHV 3.516354, 7.404192, 7.414502, 3.541526, 1.658747, 7.440738, 3.594323…
$ Predicted_CountSBV 2.659130e+01, 2.074335e+00, 2.100972e+00, 2.660040e+01, 2.220446e-16…
$ Predicted_CountWRV 4.816801e+01, 1.476080e+01, 1.477168e+01, 4.813040e+01, 1.354734e+00…
$ Groups “2016 _ Wet”, “2016 _ Wet”, “2016 _ Wet”, “2016 _ Wet”, “2016 _ Wet”…
I have first defined the best model for each species under R.
BestModelRHV <- modelP_RHV_Date2_Site_Season
BestModelSBV <- modelP_SBV_Date2_Site_Season
BestModelWRV <- modelP_WRV_Date2_Site_Season
I then used the function fitted (and not predict) which takes a model object as input and returns the predicted values for the data points that were used to fit the model (i.e., the training data). It provides in-sample predictions.
data$Predicted_CountRHV <- fitted(BestModelRHV, type = "response")
data$Predicted_CountSBV <- fitted(BestModelSBV, type = "response")
data$Predicted_CountWRV <- fitted(BestModelWRV, type = "response")
I then want to plot our models, along with the observed data for all three species by sites. We will therefore do one plot with points for each species , with estimated mean count from top models (line) and 95% confidence interval of the mean (grey ribbon).
data$titleSBV <- "Slender-billed Vulture"
pSBV <- ggplot(data, aes(x = Date)) +
geom_point(aes(y = SBV), color = "black", size = 1.25) + ylim(0,80) +
# Add a line for the estimated mean count
geom_line(aes(y = Predicted_CountSBV), color = "black", linetype = "solid") +
# Add a 95% confidence interval (grey ribbon)
geom_ribbon(data = data, aes(ymin = Predicted_CountSBV - 1.96, ymax = Predicted_CountSBV + 1.96), fill = "grey", alpha = 0.5) +
labs(x = "", y = "") +
theme_bw() + facet_grid(cols = vars(Site), rows = vars(titleSBV))
print(pSBV)
Here the results I am getting:
enter image description here
The issue is concerning the predicted lines which does steps which I assume is linked to the season. I have tried doing another but grouping by season (I added group = Groups which are groups created by season per year) but I do not get the results I was expecting.
I would be interesting in getting an equivalent Figure as Figure 3 in the article, for which I added a screenshot below but I got stuck with the phrase average by season for which I am not fully sure I know what they did.
enter image description here