对xarray对象进行重新取样以降低空间分辨率

Question 1

Use xarray to resample to lower spatial resolution

我想把我的xarray对象重新取样到一个较低的空间分辨率（LESS PIXELS）。

import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
My question is, how to I resample the following by the x,y coordinates to a 200x200 grid?
这是在降低变量的空间分辨率。
我所尝试的是以下几点。
output_ds.resample(x=200).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    701         group = DataArray(dim_coord, coords=dim_coord.coords,
    702                           dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703         grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
    704         resampler = self._resample_cls(self, group=group, dim=dim_name,
    705                                        grouper=grouper,
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
   1198                              .format(convention))
-> 1200         freq = to_offset(freq)
   1202         end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
    174                     delta = delta + offset
    175         except Exception:
--> 176             raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
    178     if delta is None:
ValueError: Invalid frequency: 200
But I get the error shown.
我怎样才能完成对x和y的这种空间重采样？
 Ideally I want to do this:
output_ds.resample(x=200, y=200).mean()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    679         if len(indexer) != 1:
    680             raise ValueError(
--> 681                 "Resampling only supported along single dimensions."
    682             )
    683         dim, freq = indexer.popitem()
ValueError: Resampling only supported along single dimensions.
NOTE: Real data has different behaviour
这是我在上面创建的测试数据上的结果。在从netcdf文件读入的真实数据上
<xarray.Dataset>
Dimensions:    (time: 368, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
    latitude   (y, x) float32 ...
    longitude  (y, x) float32 ...
    Data_Mask  (y, x) float32 ...
    BHR_SW     (time, y, x) float32 ...
Attributes:
    CDI:               Climate Data Interface version 1.9.5 (http://mpimet.mp...
    Conventions:       CF-1.4
    history:           Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
    content:           extracted variabel BHR_SW of the original GlobAlbedo (...
    metadata_profile:  beam
    metadata_version:  0.5
    CDO:               Climate Data Operators version 1.9.5 (http://mpimet.mp...
我也试过类似的事情。
ds.resample(x=200).mean()
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    686         dim_coord = self[dim]
--> 688         if isinstance(self.indexes[dim_name], CFTimeIndex):
    689             raise NotImplementedError(
    690                 'Resample is currently not supported along a dimension '
/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
    309         if key not in self._sizes:
    310             raise KeyError(key)
--> 311         return self._variables[key].to_index()
    313     def __unicode__(self):
KeyError: 'x'
非常感谢任何帮助。

Question 2


          
           
            
             
              
               Update
              
              
               @clausmichele's
               
                答案
               
               using
               
                
                 coarsen
                
               
               是现在最好的方法。请注意，coarsen现在包括指定所需输出坐标的功能。
              
              
               Original post
              
              
               如
               
                piman314
               
               建议，groupby是在xarray中做这个的唯一方法。重新取样只能用于日期时间坐标。
              
              
               由于xarray目前没有处理多维的groupby，这必须分两个阶段完成。
              
              # this results in bin centers on 100, 300, ...
reduced = (
    output_ds
    .groupby(((output_ds.x//200) + 0.5) * 200)
    .mean(dim='x')
    .groupby(((output_ds.y//200) + 0.5) * 200)
    .mean(dim='y'))
如果你只是想对你的数据进行降样，你可以使用位置切分法。
output_ds[:, ::200, ::200]
或者，使用命名的dims。
output_ds[{'x': slice(None, None, 200), 'y': slice(None, None, 200)}]
最后，还有一些专门为快速重新网格化设计的软件包，与xarray兼容。xESMF是一个很好的例子。

Question 3


          
           
            
             
              
               
                
                 
                  Recently the
                  
                   
                    粗化
                   
                  
                  方法已经被添加到xarray中，我认为这是空间降采样的最好方法，尽管不可能使用它来设置一个期望的最终分辨率并让它自动计算。
Coarsen会在不重叠的窗口上进行操作（平均值、最大值、最小值等），根据你设置的窗口大小，你会得到你想要的最终分辨率。
                 
                 
                  作者的原始输入数据。
                 
                 import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)
latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)
BHR_SW = np.ones((365, 1200, 1200))
output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename({'dim_0':'time','dim_1':'y','dim_2':'x'})
latitude_da = latitude_da.rename({'dim_0':'y','dim_1':'x'})
longitude_da = longitude_da.rename({'dim_0':'y','dim_1':'x'})
output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign({'latitude':latitude_da, 'longitude':longitude_da})
print(output_ds)
<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
粗化方法，将空间分辨率从1200x1200降低到200x200，我们需要6x6的窗口。
output_ds.coarsen(x=6).mean().coarsen(y=6).mean()
# or output_ds.coarsen(x=6,y=6).mean()
<xarray.Dataset>
Dimensions:    (time: 365, x: 200, y: 200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
  * x          (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
    longitude  (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52

Question 4


          
           
            
             
              
               
                
                 
                  
                   
                    由于你使用的是已经用CDO处理过的
                    
                     NetCDF
                    
                    文件，你也可以使用CDO
                    
                     SAMPLEGRID
                    
                    函数或NCOs
                    
                     bilinear_interp
                    
                    函数。
                   
                   
                    
                     SAMPLEGRID
                    
                    (
                    
                     https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
                    
                    )并不插值，它只是删除了每一个n个网格点。
                   
                   
                    
                     bilinear_interp
                    
                    (
                    
                     http://nco.sourceforge.net/nco.html#Bilinear-interpolation
                    
                    )做插值。
                   
                   
                    由于你可能想要平均、最大、任何反照率值，你可能更喜欢NCOs
                    
                     bilinear_interp
                    
                    。但是CDO
                    
                     SAMPLEGRID
                    
                    可以给你NOCs
                    
                     bilinear_interp
                    
                    所需的
                    
                     grid_out
                    
                    。

Question 5


          
           
            
             
              
               
                
                 
                  
                   
                    要使用
                    
                     xarray
                    
                    ，最明显的方法是使用
                    
                     groupby_bins
                    
                    ，然而事实证明，这非常慢。进入
                    
                     numpy
                    
                    并使用超快的索引（
                    
                     [:, :, frequency]
                    
                    ）可能更有效率。