I am trying to model the proportions of a variable with four categories through time and at four different sites. Most sites have all four categories of the variable, but some sites only have three categories. I want to model the trend in the categories across years using a hierarchical GAM that includes a shared smooth, as well as a site-specific smooth.
How do I adjust my multinomial model in mgcv
below for this to work? I’m assuming that there needs to be some grouping factor added so the model knows which proportion is for each variable category.
Example:
library(tidyverse)
library(mgcv)
set.seed(42)
df <- expand.grid(site = LETTERS[1:10],
year = c(1:10)) %>%
mutate(a = runif(100, 0, 0.5),
b = runif(100, 0, 0.3),
c = runif(100, 0, 0.2),
c = if_else(site %in% LETTERS[1:8], c, NA_real_),
d = if_else(site %in% LETTERS[1:8], 1 - a - b - c, 1 - a - b)) %>%
pivot_longer(cols = a:d, names_to = "var", values_to = "prop") %>%
filter(!is.na(prop))
m <- gam(list(prop ~ s(year) + s(year, site, bs = "sz"),
~ s(year) + s(year, site, bs = "sz"),
~ s(year) + s(year, site, bs = "sz"),
~ s(year) + s(year, site, bs = "sz")),
data = df, method = "REML", family = multinom(K=4),
control = gam.control(nthreads = 4))
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors