So, I have been trying to use GAMs to observe the relationship between total economic damages (due to a certain event) and a variety of factors, including, total number of events, total number of people affected, level of infrastructure development etc. I am pretty new to GAMs and would really appreciate some help!
Here’s what I have tried so far:
library(mgcv)
model1<-gam(total_damages~s(total_events)+s(total_affected)+s(coastlines)+s(total_gdp, k=1)+s(urban_landarea)+s(infrastructure_index), tw(link=”log”))
I used tw(link=”log”) because total_damages are not normally distributed. I plotted a histogram to check. I also noticed that the variance of this variable is much bigger than its mean. However, if using tweedie is wrong here, please let me know. Also, I’m not too sure if I should be using a smooth function for all the independent variables. I noticed that total_gdp has a linear relationship with total_damages, so I set k=1. However, I’m not too sure if there are any repurcussions to using smooth functions for so many variables.
I want to show you the results I got from this model.
summary(model1)
model1 summary
I don’t think that the p-values are of much use in this case. I wonder if the R-sq.-adj and “deviance explained” values are of worth here. Please let me know if they mean something important in the case of GAMs.
I want to show one of the charts here:
plot(model1)
here’s the link to the image: https://imgur.com/7gu5DKr
The linked image shows the plot between an independent variable (infrastructure_index) and the smooth “s(infrastructure_index)”. The line looks kinda straight but looking at the coefficients (coef(model1)) tells me that they frequently fluctuate between negative and positive values. I think it means that there’s an irregular relationship between infrastructure_index and total_damages?
I tried modelling another GAM, but this time, I just focused on using 1 independent variable, “infrastructure_index”.
model2<-gam(total_damages~s(infrastructure_index))
here’s the model summary:
model2 summary
The R-sq-adj and Deviance Explained values have falled down. But the plot has become much clearer and I feel like I have a more clear understanding of the relationship between infrastructure_index and total_damages:(https://i.sstatic.net/EDY8h4wZ.png)
Is the GAM model with multiple variables a better tool for understanding the relationship between the independent variables and total_damages? Are GAMs with single variables less useful? Why do the two graphs differ so much? When should I use GAMs with multiple vs single variables?
RdhaR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1