No such file or directory: .../part.0.parquet

3/20/2019

After uploading a parquet file to my kubernetes cluster for processing with Dask, I get a FileNotFoundError when trying to read

df=dd.read_parquet('home/jovyan/foo.parquet')
df.head()

Here is the full error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/user_engagement_anon.parquet/part.0.parquet'

I can see that the file does indeed exist, and relative to the working directory of my jupyter notebook instance, it's in the expected location.

I'm not sure if it matters, but to start the dask client on my kubernetes cluster, I used the following code:

from dask.distributed import Client, progress

client=Client('dask-scheduler:8786', processes=False, threads_per_worker=4, n_workers=1, memory_limit='1GB')
client

Furthermore, the same operation works fine on my local machine with the same parquet file

-- BirdLaw
dask
jupyterhub
kubernetes
python

1 Answer

3/21/2019

The problem was that I was installing dask separately using a helm release. Thus, the dask workers did not share the same file system as the jupyter notebook

To fix this, I used dask-kubernetes python library to create the workers, rather than a separate helm release.

-- BirdLaw
Source: StackOverflow