You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to use xarray.apply_ufunc to get around out-of-memory issues I am getting with the pyKrige kriging package. I am using pykrige.OrdinaryKriging3D to interpolate a large set of temperature measurements, at scattered (longitude,latitude,time) positions, onto a regular 3D grid of (lon_reg,lat_reg,time_reg).
This requires huge amounts of memory, and to try and get around this I have been attempting to take advantage of xarray's lazy operations and calling the pyKrige commands (which takes numpy arrays) from within apply_ufunc. However, I am still running into memory issues.
I have pasted some example code below using randomised data. Can anyone comment on whether I am making a mistake in the code, such as in the apply_ufunc call, or whether the problem might be more fundamental to the kriging operation itself -- are some methods simply incompatible with lazy operations over chunked arrays? The code below works on small datasets (e.g. if sample_length is set to 300), but I continue to run out of memory for longer sample lengths (e.g. of about 2000).
Note that I have also tried running the code below on a dask cluster connected to multiple CPUs, but it didn't solve the memory problems.
Thank you for any advice.
import numpy as np
import xarray as xr
from pykrige.ok3d import OrdinaryKriging3D
# create random datasets
sample_length=2000
da_time=xr.DataArray(data=np.arange(0,sample_length),coords=dict(time=np.arange(0,sample_length))).chunk(chunks={"time" : 100})
da_lat=xr.DataArray(data=np.random.uniform(-12, 3, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_lon=xr.DataArray(data=np.random.uniform(45, 62, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_temp=xr.DataArray(data=np.random.rand(sample_length)+18,coords=dict(time=da_time)).chunk(chunks={"time" : 100})
# Define the function to apply to the dataset
def kriging_3d(da_lon,da_lat,da_time,da_temp):
xi = np.linspace(-12, 3, 91)
yi = np.linspace(45, 62, 103)
zi = np.arange(np.min(da_time),np.max(da_time))
# Create the 3D Kriging object
OK3D = OrdinaryKriging3D(da_lon, da_lat, da_time, da_temp, variogram_model='linear')
# Execute on grid
out, ss = OK3D.execute('grid', xi, yi, zi)
# convert the output into an xarray object
out = xr.DataArray(out, coords=[("zi", zi), ("yi", yi), ("xi", xi)])
return out
out = xr.apply_ufunc(
kriging_3d,
da_lon,da_lat,da_time,da_temp,
input_core_dims=[["time"],["time"],["time"],["time"]],
output_core_dims=[["zi", "yi", "xi"]],
dask = 'allowed',
vectorize = True,
)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hiya
I am attempting to use xarray.apply_ufunc to get around out-of-memory issues I am getting with the pyKrige kriging package. I am using pykrige.OrdinaryKriging3D to interpolate a large set of temperature measurements, at scattered (longitude,latitude,time) positions, onto a regular 3D grid of (lon_reg,lat_reg,time_reg).
This requires huge amounts of memory, and to try and get around this I have been attempting to take advantage of xarray's lazy operations and calling the pyKrige commands (which takes numpy arrays) from within apply_ufunc. However, I am still running into memory issues.
I have pasted some example code below using randomised data. Can anyone comment on whether I am making a mistake in the code, such as in the apply_ufunc call, or whether the problem might be more fundamental to the kriging operation itself -- are some methods simply incompatible with lazy operations over chunked arrays? The code below works on small datasets (e.g. if sample_length is set to 300), but I continue to run out of memory for longer sample lengths (e.g. of about 2000).
Note that I have also tried running the code below on a dask cluster connected to multiple CPUs, but it didn't solve the memory problems.
Thank you for any advice.
Beta Was this translation helpful? Give feedback.
All reactions