I am running into issues with proper data extraction using Terra. It appears to be obtaining the mean and I am not seeing an option to select the most frequent or largest coverage for a selected area.
I have two raster stack datasets I am working with for a given area.
Dataset1: is irrelevant to the task at hand aside from the fact that it was used to create the extent that will be used to extract data from Dataset2. Dataset 1 consisted of 30×30 meter pixels that were aggregated into groups of 9 to define a zone/extent along a grid. This grid is what I will be working with to extract Dataset2 values.
Dataset2:Also consists of 30×30 meter pixels but these pixels do not line up with Dataset1 pixels. Everything is projected properly, this seems to just be how the data lines up.
For each grid I obtained the “mean” value of Dataset1. This part worked well. Dataset2, however is Land cover data, so even though it’s represented by a “number” in the data, the number is not mathematical- it just represents specific value, for example, 42 (forest) and 71(grassland) cannot be averaged to create a number that makes any sense. I was extracting the data and noticed it gave me 9 “values” for each grid for each raster, and that despite the data pixels not lining up with the grid it was still slicing them into 9 30x30m sections along the grid-line. I didn’t tell it to do this with the code. I used:
LCValues<- extract(LC, Grid)
where “LC” is my spatraster stack and “Grid” is my vector polygon grid data layer.
I presumed it would give me each value in the grid and that I could obtain how much area each value covered to select the value with the most weight, but what happened with the 30×30 subdivided areas was that it was giving whole number where it could (when the value was constant across the 30x30m area) but sometimes the value obtained for each 30x30m was a weighted average of the 30x30m section, for example if it was 75% grassland and 25% forested, my value would be closer to the grassland value (60s) but if it was 25% forested and 75% grassland it would be in the upper 40s. I do not need the 30×30 meter “sub grids” at all, but at the very least I would like it to at least choose one of the values in the grid not average them because that data is un-useable to me. I would prefer to extract the value that is most constant (covers the greatest area) across the 90x90m grid area if possible.
I have tried adding exact=TRUE and weights=TRUE but this doesn’t work because of the weird way it’s automatically subdividing into multiple 30×30 m sub-grids.
The simplest solution would be to make my grid follow along the data but this isn’t possible because I need to be able to merge this with Dataset1 eventually for further analysis.
I also tried using a spatial join (so I could use intersect and largest=TRUE) but that isn’t possible with the SpatRaster. Is there another option I am missing? And does anyone know where these 9 30×30 meter grids within my grid are coming from?
9