I currently am working with atmospheric data that came from multiple files separated in time (time discontinuous). I loaded in the netCDF files using the xarray open_mfdataset function and combined the files “by_coords”. All of this data, however, is strictly time-dependent only when I open the Dataset. Provided is a snippet of what the data looks like when opened. The main variables I am concerned about here would be “pres” (pressure) and “tdry” (temperature). There are many other variables in this Dataset as well such as wind components, wind speed/direction, relative humidity, etc. My problem here is I want each of the data variables to also be dependent on the coordinate seen here as “gpsalt” or altitude/height. For example, only time is attached to each temperature value and I would want altitude to also be attached to the same temperature value with the respective correct altitude value.
Currently what I have tried is simply adding the gpsalt variable as an expanded dimension to the Dataset and labeling it as “altitude”. I took the specific values of altitude as an array and expanded the dataset dimensions. I’m assuming (could be an incorrect assumption) that this takes the indexes of the gpsalt value and aligns them with the other “time” dimension too.
gpsalt_data = (ds['gpsalt'].values)
ds = ds.expand_dims(altitude = gpsalt_data, axis=1)
ds
With this I get the following dataset where “altitude” is added to the data variables:
With this I’m assuming that the time and altitude values “match up” exactly where they should (i.e. the 5th index of time should accurately be the 5th index of altitude too in the dataset. So time and altitude should align and thus the temperature value at those [5,5] indexing should be correct as well.) I don’t know if this is correct though. The next step I want to take is resampling the Dataset to a new resolution of altitude data. To do so, I’m using the xarray.interp function:
new_gpsalt = np.linspace(ds.gpsalt.min().compute(), ds.gpsalt.max().compute(), new_time.size)
ds = ds.interp(altitude=new_gpsalt, method='linear')
But I get the following error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
The end goal of this is for me to resample and interpolate the atmospheric data values, which are time discontinuous, in height and time. So I want to fill in the data gaps between time by interpolating between data variables at similar heights. Could anyone provide any insight and help as to if this is the correct way about doing this?