Minio, when used with spark on EKS cluster getting access denied error

10/26/2021

I am using MINIO and I have launched a MINIO gateway with helm on amazon EKS kubernetes cluster. I have added below properties needed on spark side

sparkConf.set("fs.s3a.endpoint", "minio-k8s-service':9000");
sparkConf.set("fs.s3a.connection.ssl.enabled", "false");
sparkConf.set("fs.s3a.signing-algorithm", "S3SignerType");
sparkConf.set("s.s3a.connection.timeout", "100000");
sparkConf.set("spark.master", "k8sSchedulerURL");
sparkConf.set("spark.deploy.mode", "cluster");
sparkConf.set("fs.s3a.committer.staging.conflict-mode", "replace");sparkConf.set("spark.hadoop.fs.s3a.access.key","myaccesskey")sparkConf.set("spark.hadoop.fs.s3a.secret.key","mysecretkey")

Below line of code works fine. When I try read a file from S3

JavaSparkContext(session.sparkContext()).textFile("s3a://mybbucket/myfolder/sample.parquet", 1)

However If I try to load a file like below it fails with access denied error

sc.read().parquet("s3a://mybucket/myfolder/myfile.parquet")

It fails with error getFileStatus on s3a://mybucket/myfolder/testfile.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XYZ123XYZ; S3 Extended Request ID: null), S3 Extended Request ID: null:403 Forbidden

I am using hadoop-aws-3.2.0 jar with spark3.1.1. My accesskey and secretkey with AWS is correct and tried all possible options. This error looks weird even after passing right credentials shows this error.

Any help appreciated.

-- Naresh G
amazon-eks
apache-spark
kubernetes
minio
pyspark

1 Answer

2/28/2022

You might have to set fs.s3a.path.style.access to true. If it doesn't work for you, the MinIO team is available on their public slack channel or by email to answer questions 24/7/365.

-- r1j1m1n1
Source: StackOverflow