Over a GC I've deployed dask using HELM and the stable/dask repo.
Once running and added Xarray and Rasterio trough the config.yaml file I'm able to read the files using xarray.open_rasterio('...').
If I try to evoke .compute() on the object the I got an error saying that rasterio has created an IOError as no such file has been found. I'ts the first time it happens to me
To replicate here is my config.yaml
worker:
replicas: 3
env:
- name: EXTRA_APT_PACKAGES
value : libzstd1
- name: EXTRA_CONDA_PACKAGES
value: numpy pandas scipy rasterio xarray matplotlib netcdf4 nomkl statsmodels numba gcsfs pyhdf -c conda-forge
- name: EXTRA_PIP_PACKAGES
value: git+https://github.com/PhenoloBoy/FenicePhenolo
jupyter:
enabled: true
env:
- name: EXTRA_APT_PACKAGES
value : apt-utils libzstd1
- name: EXTRA_CONDA_PACKAGES
value: numpy pandas scipy rasterio xarray matplotlib netcdf4 nomkl statsmodels numba gcsfs pyhdf -c conda-forge
- name: EXTRA_PIP_PACKAGES
value: git+https://github.com/PhenoloBoy/FenicePhenolo
Here the script
import xarray as xr
from distributed import Client
client = Client()
data = xr.open_rasterio('file.img', chunks=(..,..,..))
data.compute()
It sounds like your dask workers don't have access to the same filesystem as your client.
To elaborate, you first find the list of files from the client size, and fetch some metadata. Then you use the workers to actually load chunks, so it is necessary that they can see exactly the same files. You must have some shared file-system, or refer to external storage such as s3/gcs.