Xarray (rasterio) on Kubernetes\Dask fail to find the path if .computed is evoke

7/31/2019

Over a GC I've deployed dask using HELM and the stable/dask repo.

Once running and added Xarray and Rasterio trough the config.yaml file I'm able to read the files using xarray.open_rasterio('...').

If I try to evoke .compute() on the object the I got an error saying that rasterio has created an IOError as no such file has been found. I'ts the first time it happens to me

To replicate here is my config.yaml

worker:
  replicas: 3
  env:
    - name: EXTRA_APT_PACKAGES
      value : libzstd1
    - name: EXTRA_CONDA_PACKAGES
      value: numpy pandas scipy rasterio xarray matplotlib netcdf4 nomkl statsmodels numba gcsfs pyhdf -c conda-forge
    - name: EXTRA_PIP_PACKAGES
      value: git+https://github.com/PhenoloBoy/FenicePhenolo
jupyter:
  enabled: true
  env:
    - name: EXTRA_APT_PACKAGES
      value : apt-utils libzstd1
    - name: EXTRA_CONDA_PACKAGES
      value: numpy pandas scipy rasterio xarray matplotlib netcdf4 nomkl statsmodels numba gcsfs pyhdf -c conda-forge
    - name: EXTRA_PIP_PACKAGES
      value: git+https://github.com/PhenoloBoy/FenicePhenolo

Here the script

import xarray as xr
from distributed import Client

client = Client()
data = xr.open_rasterio('file.img', chunks=(..,..,..))
data.compute()
-- Cursore
dask
distributed
kubernetes
kubernetes-helm
python-xarray

1 Answer

8/4/2019

It sounds like your dask workers don't have access to the same filesystem as your client.

To elaborate, you first find the list of files from the client size, and fetch some metadata. Then you use the workers to actually load chunks, so it is necessary that they can see exactly the same files. You must have some shared file-system, or refer to external storage such as s3/gcs.

-- Ryan
Source: StackOverflow