python - Resample xarray object to lower resolution spatially

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Use xarray to resample to lower spatial resolution

I want to resample my xarray object to a lower spatial resolution (LESS PIXELS).
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
My question is, how to I resample the following by the x,y coordinates to a 200x200 grid?
This is a REDUCING the spatial resolution of the variable.
What I have tried is the following:
output_ds.resample(x=200).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    701         group = DataArray(dim_coord, coords=dim_coord.coords,
    702                           dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703         grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
    704         resampler = self._resample_cls(self, group=group, dim=dim_name,
    705                                        grouper=grouper,
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
   1198                              .format(convention))
-> 1200         freq = to_offset(freq)
   1202         end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
    174                     delta = delta + offset
    175         except Exception:
--> 176             raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
    178     if delta is None:
ValueError: Invalid frequency: 200
But I get the error shown.
How can I complete this spatial resampling for x and y?
 Ideally I want to do this:
output_ds.resample(x=200, y=200).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    679         if len(indexer) != 1:
    680             raise ValueError(
--> 681                 "Resampling only supported along single dimensions."
    682             )
    683         dim, freq = indexer.popitem()
ValueError: Resampling only supported along single dimensions.
NOTE: Real data has different behaviour
this on the test data I have created above. On the real data read in from a netcdf file
<xarray.Dataset>
Dimensions:    (time: 368, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
    latitude   (y, x) float32 ...
    longitude  (y, x) float32 ...
    Data_Mask  (y, x) float32 ...
    BHR_SW     (time, y, x) float32 ...
Attributes:
    CDI:               Climate Data Interface version 1.9.5 (http://mpimet.mp...
    Conventions:       CF-1.4
    history:           Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
    content:           extracted variabel BHR_SW of the original GlobAlbedo (...
    metadata_profile:  beam
    metadata_version:  0.5
    CDO:               Climate Data Operators version 1.9.5 (http://mpimet.mp...
I have tried a similar thing:
ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    686         dim_coord = self[dim]
--> 688         if isinstance(self.indexes[dim_name], CFTimeIndex):
    689             raise NotImplementedError(
    690                 'Resample is currently not supported along a dimension '
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
    309         if key not in self._sizes:
    310             raise KeyError(key)
--> 311         return self._variables[key].to_index()
    313     def __unicode__(self):
KeyError: 'x'
Any help very much appreciated.
Recently the coarsen method has been added to xarray and I think it's the best way for spatially downsampling, even though it's not possible to use it setting a desired final resolution and have it computed automatically.
Coarsen will perform an operation (mean, max, min, etc) over non-overlapping windows and depending on the window size you set you will get your desired final resolution.
Original input data from the author:
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
Coarsen method to reduce spatial resolution from 1200x1200 to 200x200, we need 6x6 windows.
output_ds.coarsen(x=6).mean().coarsen(y=6).mean()
# or output_ds.coarsen(x=6,y=6).mean()
<xarray.Dataset>
Dimensions:    (time: 365, x: 200, y: 200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
  * x          (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
    longitude  (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52
Update
@clausmichele's answer using coarsen is now the best way to do this. Note that coarsen now includes the ability to specify desired output coordinates.
Original post
As piman314 suggests, groupby is the only way to do this in xarray. Resample can only be used for datetime coordinates.
Since xarray currently does not handle multidimensional groupby, this has to be done in two stages:
# this results in bin centers on 100, 300, ...
reduced = (
    output_ds
    .groupby(((output_ds.x//200) + 0.5) * 200)
    .mean(dim='x')
    .groupby(((output_ds.y//200) + 0.5) * 200)
    .mean(dim='y'))
If you simply want to downsample your data, you can use positional slicing:
output_ds[:, ::200, ::200]
or, using named dims:
output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]
Finally, there are other packages out there that are specifically designed for fast regridding compatible with xarray. xESMF is a good one.
                How does one specify desired output coordinates?  My reading of it is that you still pass a window for each dimension you want coarsened.
– HoosierDaddy
                Apr 2 at 19:29
As you are using a NetCDF file which already was manipulated with CDOs you could also use either CDOs SAMPLEGRID function or NCOs bilinear_interp  function:
SAMPLEGRID (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) does not interpolate, it just removes every n-th grid point. 
bilinear_interp (http://nco.sourceforge.net/nco.html#Bilinear-interpolation) does interpolation.  
As you probably want mean, max, whatever albedo values you probably would prefer NCOs bilinear_interp. But CDOs SAMPLEGRID can give you the grid_out you need for NOCs bilinear_interp.
To do it using xarray the most obvious way is to use groupby_bins, however it turns out this is incredibly slow. It's probably much more effecient to drop into numpy and use the superfast indexing ([:, :, frequency]) 
nsamples = 200
bins = np.linspace(output_ds.x.min(),
                   output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.