I am trying to use an RDA model to find how strongly certain habitat variables are correlated with the abundance of bullfrog.
rda_b2 = rda(formula = d_final_scaled$BF_Conc_TF ~ mean_PVI_E + Refugia + Shallow_BL + Oxygen,
data = d_final_scaled)
However, half of our ponds did not have any bullfrog. This has created a very right-skewed dataset, thereby violating the normality assumption of an RDA. Unsurprisingly, a Shapiro test revealed a p-value = 4.786e-07 for the residuals of my model.
I tried a log transformation log10(d_final$BF_Conc + 1)
and also scaled my data. However I came to realize that this was perhaps a tad pointless, because I believe no matter what kind of transformation I use, the ‘0’ observations will skew the data all the same. Is there anything I can do so I can use this dataset for an RDA?
Below is a simple histogram showing my log-transformed and scaled bullfrog data
enter image description here
PS: I read something about using a zero-inflated model and then using the residuals of that model in my RDA, but I don’t fully trust the source nor pretend to be familiar with such models. Is this a valid option?
Thanks for reading this far!