I am working with atmospheric data that is stored in Xarray Datasets.
However, when I try to load the numerical values into numpy arrays to compute further operations, it takes a lot of time 🙁
In fact, the JupyterNotebook kernel always dies before finishing – I guess due to lack of space, but I have 256 GB of memory.
Does anyone know how can I do it more efficiently?
Thanks!
I have tried loading the data year by year, but it is still too slow and does not finish due to lack of storage.
Here is the code:
#################################
# CATALOG LOAD
################################
link = "link to intake catalog"
cat = intake.open_catalog(link)
cat1 = "28kmresolution"
cat2 = "2D_1h_native_data"
ds = cat.IFS[cat1][cat2]
#################################
# VARIABLE LOAD
################################
wind_speed = np.sqrt(ds['10u']**2 + ds['10v']**2) # 10m above surface wind speed
#################################
# VARIABLE INFORMATION
################################
print(wind_speed.shape) # (43393, 654400)
print(type(ws)) #<class 'xarray.core.dataarray.DataArray'>
GigaBytes = int(round(ws.nbytes/1e9))
print(GigaBytes) # 227 GB
#################################
# LOAD THE VALUES INTO NUMPY ARRAY
################################
ws2023 = ws.sel(time = "2023")
print(ws2023.shape) # (8670, 654400)
#----------------------------------
# And here is when the Kernel shuts down
wind_speed_values = ws2023.values