I come to the community to ask help for some issues regarding xarrary + geopandas for time series data analysis.
I have a xarray dataset called cum_magn_ds
that looks like this:
xarray.Dataset
Dimensions:
time: 16y: 3001x: 3000
Coordinates:
x (x) float64 3.942e+05 3.942e+05 ... 4.392e+05
y (y) float64 6.673e+06 6.673e+06 ... 6.628e+06
time (time) datetime64[ns] 2000-03-31 ... 2024-01-13
Data variables:
cum_disp (time, y, x) float64 nan nan nan nan ... nan nan nan nan
Attributes:
AREA_OR_POINT : Area
STATISTICS_APPROXIMATE : YES
STATISTICS_MAXIMUM : 61.55648635672
STATISTICS_MEAN : 2.2860794614348
STATISTICS_MINIMUM : 0.0048748816705561
STATISTICS_STDDEV : 2.9354504034697
STATISTICS_VALID_PERCENT : 76.11
On the other hand, I have a geopackages file containing 200 polygons. The GeoDataframe called selected_gdf
looks like this:
area_m2 id geometry
53 100636.350443 385 MULTIPOLYGON (((412065.000 6663985.000, 412065...
56 79473.343769 401 MULTIPOLYGON (((411855.000 6663685.000, 411855...
167 754674.777833 1280 MULTIPOLYGON (((415935.000 6645895.000, 415935...
What I want to do is:
– iterate though each polygon, go to slice clip the xArray_dataset
and compute basic statistics, mean for instance. When I make this processing with a single polygon, it works and I got good results (see the code below):
# Getting the biggest polygon
single_poly_gdf = gdf.loc[[gdf['area_m2'].idxmax()]]
# Clipping with rio.cpli within xArary
cum_magn_ds.rio.clip(single_poly_gdf.geometry.values, single_poly_gdf.crs).cum_disp.mean(dim=["x", "y"], skipna=True)
The result:
xarray.DataArray'cum_disp'time: 16
array([ 1.58297256, 1.73710292, 3.53712282, 4.27412698, 19.88154365,
20.90266202, 21.16476275, 21.99914255, 24.88651852, 25.80484704,
26.92583463, 28.62922104, 29.28387035, 31.90287332, 32.59387569,
31.56232418])
Coordinates:
time (time) datetime64[ns] 2000-03-31 ... 2024-01-13
spatial_ref () int64 0
Attributes: (0)
The dtype
of single_poly_gdf
is geopandas.array.GeometryArray
.
The problem arises when I want to work with the large GeoDataFrame
(200 polygons).
I have problems getting the geometry for each polygon (row) and passing through the xArray_dataset.rio.clip(...)
.
I tried the following code, but I have problems with the geometry:
results_from_three = []
for row in selected_gdf.iterfeatures():
# print(type(row))
# print(row['geometry']['coordinates'])
aoi_coords = row['geometry']['coordinates']
# print(aoi_coords)
aoi = cum_magn_ds.rio.clip(aoi_coords)
mp = aoi.cum_disp.mean(dim=["x", "y"], skipna=True)
results_from_three.append(mp)
# Concatenate results into a single DataFrame
df = xr.concat(results_from_three, dim='polygon').to_dataframe()
Do you have some suggestions to solve the problem? notably on getting the geometry for each row? Or even if you have some suggestion to improve the code, I would be very grateful.
Thanks in advance