Attempt at using xr.apply_func to avoid memory issues with a kriging package #8601

MDTocean · 2024-01-08T18:22:37Z

MDTocean
Jan 8, 2024

Hiya

I am attempting to use xarray.apply_ufunc to get around out-of-memory issues I am getting with the pyKrige kriging package. I am using pykrige.OrdinaryKriging3D to interpolate a large set of temperature measurements, at scattered (longitude,latitude,time) positions, onto a regular 3D grid of (lon_reg,lat_reg,time_reg).

This requires huge amounts of memory, and to try and get around this I have been attempting to take advantage of xarray's lazy operations and calling the pyKrige commands (which takes numpy arrays) from within apply_ufunc. However, I am still running into memory issues.

I have pasted some example code below using randomised data. Can anyone comment on whether I am making a mistake in the code, such as in the apply_ufunc call, or whether the problem might be more fundamental to the kriging operation itself -- are some methods simply incompatible with lazy operations over chunked arrays? The code below works on small datasets (e.g. if sample_length is set to 300), but I continue to run out of memory for longer sample lengths (e.g. of about 2000).
Note that I have also tried running the code below on a dask cluster connected to multiple CPUs, but it didn't solve the memory problems.

Thank you for any advice.

import numpy as np
import xarray as xr
from pykrige.ok3d import OrdinaryKriging3D

# create random datasets
sample_length=2000
da_time=xr.DataArray(data=np.arange(0,sample_length),coords=dict(time=np.arange(0,sample_length))).chunk(chunks={"time" : 100})
da_lat=xr.DataArray(data=np.random.uniform(-12, 3, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_lon=xr.DataArray(data=np.random.uniform(45, 62, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_temp=xr.DataArray(data=np.random.rand(sample_length)+18,coords=dict(time=da_time)).chunk(chunks={"time" : 100})

# Define the function to apply to the dataset
def kriging_3d(da_lon,da_lat,da_time,da_temp):

    xi = np.linspace(-12, 3, 91)
    yi = np.linspace(45, 62, 103)
    zi = np.arange(np.min(da_time),np.max(da_time))

    # Create the 3D Kriging object
    OK3D = OrdinaryKriging3D(da_lon, da_lat, da_time, da_temp, variogram_model='linear')
    # Execute on grid
    out, ss = OK3D.execute('grid', xi, yi, zi)

    # convert the output into an xarray object
    out = xr.DataArray(out, coords=[("zi", zi), ("yi", yi), ("xi", xi)])

    return out

out = xr.apply_ufunc(
    kriging_3d,
    da_lon,da_lat,da_time,da_temp,
    input_core_dims=[["time"],["time"],["time"],["time"]],
    output_core_dims=[["zi", "yi", "xi"]],
    dask = 'allowed', 
    vectorize = True,
    )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt at using xr.apply_func to avoid memory issues with a kriging package #8601

{{title}}

Replies: 0 comments

Select a reply

Attempt at using xr.apply_func to avoid memory issues with a kriging package #8601

MDTocean Jan 8, 2024

Replies: 0 comments

MDTocean
Jan 8, 2024