Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Use xarray to resample to lower spatial resolution
I want to resample my xarray object to a lower spatial resolution (LESS PIXELS).
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions: (time: 365, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
* x (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
longitude (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
My question is, how to I resample the following by the x,y coordinates to a 200x200 grid?
This is a REDUCING the spatial resolution of the variable.
What I have tried is the following:
output_ds.resample(x=200).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
701 group = DataArray(dim_coord, coords=dim_coord.coords,
702 dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703 grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
704 resampler = self._resample_cls(self, group=group, dim=dim_name,
705 grouper=grouper,
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
1198 .format(convention))
-> 1200 freq = to_offset(freq)
1202 end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
174 delta = delta + offset
175 except Exception:
--> 176 raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
178 if delta is None:
ValueError: Invalid frequency: 200
But I get the error shown.
How can I complete this spatial resampling for x and y?
Ideally I want to do this:
output_ds.resample(x=200, y=200).mean()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
679 if len(indexer) != 1:
680 raise ValueError(
--> 681 "Resampling only supported along single dimensions."
682 )
683 dim, freq = indexer.popitem()
ValueError: Resampling only supported along single dimensions.
NOTE: Real data has different behaviour
this on the test data I have created above. On the real data read in from a netcdf file
<xarray.Dataset>
Dimensions: (time: 368, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
latitude (y, x) float32 ...
longitude (y, x) float32 ...
Data_Mask (y, x) float32 ...
BHR_SW (time, y, x) float32 ...
Attributes:
CDI: Climate Data Interface version 1.9.5 (http://mpimet.mp...
Conventions: CF-1.4
history: Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
content: extracted variabel BHR_SW of the original GlobAlbedo (...
metadata_profile: beam
metadata_version: 0.5
CDO: Climate Data Operators version 1.9.5 (http://mpimet.mp...
I have tried a similar thing:
ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
686 dim_coord = self[dim]
--> 688 if isinstance(self.indexes[dim_name], CFTimeIndex):
689 raise NotImplementedError(
690 'Resample is currently not supported along a dimension '
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
309 if key not in self._sizes:
310 raise KeyError(key)
--> 311 return self._variables[key].to_index()
313 def __unicode__(self):
KeyError: 'x'
Any help very much appreciated.
Recently the coarsen method has been added to xarray and I think it's the best way for spatially downsampling, even though it's not possible to use it setting a desired final resolution and have it computed automatically.
Coarsen will perform an operation (mean, max, min, etc) over non-overlapping windows and depending on the window size you set you will get your desired final resolution.
Original input data from the author:
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions: (time: 365, x: 1200, y: 1200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
* x (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
longitude (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
Coarsen method to reduce spatial resolution from 1200x1200 to 200x200, we need 6x6 windows.
output_ds.coarsen(x=6).mean().coarsen(y=6).mean()
# or output_ds.coarsen(x=6,y=6).mean()
<xarray.Dataset>
Dimensions: (time: 365, x: 200, y: 200)
Coordinates:
* time (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
* y (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
* x (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
BHR_SW (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
latitude (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
longitude (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52
Update
@clausmichele's answer using coarsen
is now the best way to do this. Note that coarsen now includes the ability to specify desired output coordinates.
Original post
As piman314 suggests, groupby is the only way to do this in xarray. Resample can only be used for datetime coordinates.
Since xarray currently does not handle multidimensional groupby, this has to be done in two stages:
# this results in bin centers on 100, 300, ...
reduced = (
output_ds
.groupby(((output_ds.x//200) + 0.5) * 200)
.mean(dim='x')
.groupby(((output_ds.y//200) + 0.5) * 200)
.mean(dim='y'))
If you simply want to downsample your data, you can use positional slicing:
output_ds[:, ::200, ::200]
or, using named dims:
output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]
Finally, there are other packages out there that are specifically designed for fast regridding compatible with xarray. xESMF is a good one.
–
As you are using a NetCDF
file which already was manipulated with CDOs you could also use either CDOs SAMPLEGRID
function or NCOs bilinear_interp
function:
SAMPLEGRID
(https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) does not interpolate, it just removes every n-th grid point.
bilinear_interp
(http://nco.sourceforge.net/nco.html#Bilinear-interpolation) does interpolation.
As you probably want mean, max, whatever albedo values you probably would prefer NCOs bilinear_interp
. But CDOs SAMPLEGRID
can give you the grid_out
you need for NOCs bilinear_interp
.
To do it using xarray
the most obvious way is to use groupby_bins
, however it turns out this is incredibly slow. It's probably much more effecient to drop into numpy
and use the superfast indexing ([:, :, frequency]
)
nsamples = 200
bins = np.linspace(output_ds.x.min(),
output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.