Note: Baking keys into an image is the worst you can do, I did this here to have a binary equal filesystem between Docker and Kubernetes while debugging.
I am trying to start up a flink-jobmanager that persists its state in GCS, so I added a high-availability.storageDir: gs://BUCKET/ha
line to my flink-conf.yaml
and I am building my Dockerfile as described here
This is my Dockerfile:
FROM flink:1.5-hadoop28
ADD https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar /opt/flink/lib/gcs-connector-latest-hadoop2.jar
RUN mkdir /opt/flink/etc-hadoop
COPY flink-conf.yaml /opt/flink/conf/flink-conf.yaml
COPY key.json /opt/flink/etc-hadoop/key.json
COPY core-site.xml /opt/flink/etc-hadoop/core-site.xml
Now if I build this container via docker build -t flink:dev .
and start an interactive shell in it like docker run -ti flink:dev /bin/bash
, I am able to start the flink jobmanager via:
flink-console.sh jobmanager --configDir=/opt/flink/conf/ --executionMode=cluster
Flink is picking up the jar's and starting normally. However, when I use the following yaml for starting it on Kubernetes, based on the one here:
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-jobmanager
spec:
replicas: 1
selector:
matchLabels:
app: flink
component: jobmanager
template:
metadata:
labels:
app: flink
component: jobmanager
spec:
containers:
- name: jobmanager
image: flink:dev
imagePullPolicy: Always
resources:
requests:
memory: "1024Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
args: ["jobmanager"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
- containerPort: 46110
name: ha
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /opt/flink/etc-hadoop/key.json
- name: JOB_MANAGER_RPC_ADDRESS
value: flink-jobmanager
Flink seems to be unable to register the filesystem:
2018-10-04 09:20:51,357 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-default configuration-file path in Flink config.
2018-10-04 09:20:51,358 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-site configuration-file path in Flink config.
2018-10-04 09:20:51,359 DEBUG org.apache.flink.runtime.util.HadoopUtils - Adding /opt/flink/etc-hadoop//core-site.xml to hadoop configuration
2018-10-04 09:20:51,767 DEBUG org.apache.hadoop.security.UserGroupInformation - PrivilegedActionException as:flink (auth:SIMPLE) cause:java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir)
2018-10-04 09:20:51,767 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Cluster initialization failed.
java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir)
at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:122)
at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:95)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:115)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:402)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:270)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:225)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:189)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:188)
at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:91)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119)
... 12 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop File System abstraction does not support scheme 'gs'. Either no file system implementation exists for that scheme, or the relevant classes are missing from the classpath.
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:102)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
... 15 more
Caused by: java.io.IOException: No FileSystem for scheme: gs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99)
... 16 more
As Kubernetes should be using the same image, I am confused how this is possible. Am I overseeing something here?
The problem was using the dev
tag. Using specific version tags fixed the issue.