Flink Job Deployment In Kubernetes

12/12/2019

I am trying to deploy a Flink job in Kubernetes cluster (Azure AKS). The Job Cluster is getting aborted just after starting but Task manager is running fine.

The docker image is created successfully without any exception. I am able to run the docker image as well as able to SSH to docker image.

I have followed steps mentioned in the below link:

https://github.com/apache/flink/tree/release-1.9/flink-container/kubernetes

While creating image I have provided Job jar and it has been copied on "/opt/artifacts" inside the image. But still not getting why getting below exception in Job Cluster pod log.

Caused by: org.apache.flink.util.FlinkException: Failed to find job JAR on class path. Please provide the job class name explicitly.

I am new in Kubernetes, Could you please give me some clue to debug this issue.

Please find below complete logs:

A. flink-job-cluster Pod Log

develk@ACIDLAELKV01:~/cntx_eng$ kubectl logs flink-job-cluster-kszwf
Starting the job-cluster
Starting standalonejob as a console application on host flink-job-cluster-kszwf.
2019-12-12 10:37:17,170 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-12-12 10:37:17,172 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting StandaloneJobClusterEntryPoint (Version: 1.8.0, Rev:4caec0d, Date:03.04.2019 @ 13:25:54 PDT)
2019-12-12 10:37:17,172 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current user: flink
2019-12-12 10:37:17,173 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-12-12 10:37:17,173 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM: OpenJDK 64-Bit Server VM - IcedTea - 1.8/25.212-b04
2019-12-12 10:37:17,173 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum heap size: 989 MiBytes
2019-12-12 10:37:17,173 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
2019-12-12 10:37:17,174 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  No Hadoop Dependency available
2019-12-12 10:37:17,174 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM Options:
2019-12-12 10:37:17,174 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xms1024m
2019-12-12 10:37:17,174 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xmx1024m
2019-12-12 10:37:17,174 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog4j.configuration=file:/opt/flink-1.8.0/conf/log4j-console.properties
2019-12-12 10:37:17,175 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlogback.configurationFile=file:/opt/flink-1.8.0/conf/logback-console.xml
2019-12-12 10:37:17,175 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program Arguments:
2019-12-12 10:37:17,175 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --configDir
2019-12-12 10:37:17,175 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     /opt/flink-1.8.0/conf
2019-12-12 10:37:17,175 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Djobmanager.rpc.address=flink-job-cluster
2019-12-12 10:37:17,175 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dparallelism.default=1
2019-12-12 10:37:17,176 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dblob.server.port=6124
2019-12-12 10:37:17,176 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dqueryable-state.server.ports=6125
2019-12-12 10:37:17,176 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath: /opt/flink-1.8.0/lib/log4j-1.2.17.jar:/opt/flink-1.8.0/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.8.0/lib/flink-dist_2.11-1.8.0.jar:::
2019-12-12 10:37:17,176 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-12-12 10:37:17,178 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-12-12 10:37:17,306 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2019-12-12 10:37:17,306 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-12-12 10:37:17,307 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-12-12 10:37:17,307 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-12-12 10:37:17,307 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-12-12 10:37:17,307 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-12-12 10:37:17,336 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting StandaloneJobClusterEntryPoint.
2019-12-12 10:37:17,336 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install default filesystem.
2019-12-12 10:37:17,343 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2019-12-12 10:37:17,352 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install security context.
2019-12-12 10:37:17,362 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2019-12-12 10:37:17,381 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2019-12-12 10:37:17,382 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Initializing cluster services.
2019-12-12 10:37:17,638 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at flink-job-cluster:6123
2019-12-12 10:37:18,163 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-12-12 10:37:18,237 INFO  akka.remote.Remoting                                          - Starting remoting
2019-12-12 10:37:18,366 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://flink@flink-job-cluster:6123]
2019-12-12 10:37:18,375 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor system started at akka.tcp://flink@flink-job-cluster:6123
2019-12-12 10:37:18,398 INFO  org.apache.flink.configuration.Configuration                  - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2019-12-12 10:37:18,407 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-63338044-67c1-4872-a3d9-c94563b3a7c3
2019-12-12 10:37:18,412 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000
2019-12-12 10:37:18,428 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - No metrics reporter configured, no metrics will be exposed/reported.
2019-12-12 10:37:18,430 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to start actor system at flink-job-cluster:0
2019-12-12 10:37:18,464 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-12-12 10:37:18,472 INFO  akka.remote.Remoting                                          - Starting remoting
2019-12-12 10:37:18,480 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[akka.tcp://flink-metrics@flink-job-cluster:33529]
2019-12-12 10:37:18,482 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor system started at akka.tcp://flink-metrics@flink-job-cluster:33529
2019-12-12 10:37:18,490 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /tmp/blobStore-ba64dcdb-5095-41fc-9c98-0f1528d95c40
2019-12-12 10:37:18,514 INFO  org.apache.flink.configuration.Configuration                  - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address'
2019-12-12 10:37:18,515 WARN  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Upload directory /tmp/flink-web-f6be0c2d-5099-4bd6-bc72-a0ae1fc6448e/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-12-12 10:37:18,516 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Created directory /tmp/flink-web-f6be0c2d-5099-4bd6-bc72-a0ae1fc6448e/flink-web-upload for file uploads.
2019-12-12 10:37:18,603 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Starting rest endpoint.
2019-12-12 10:37:18,872 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file environment variable 'log.file' is not set.
2019-12-12 10:37:18,872 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (fallback keys: [{key=jobmanager.web.log.path, isDeprecated=true}])'.
2019-12-12 10:37:19,115 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Rest endpoint listening at flink-job-cluster:8081
2019-12-12 10:37:19,116 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - http://flink-job-cluster:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-12-12 10:37:19,116 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Web frontend listening at http://flink-job-cluster:8081.
2019-12-12 10:37:19,239 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2019-12-12 10:37:19,262 INFO  org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever  - Scanning class path for job JAR
2019-12-12 10:37:19,270 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Shutting down rest endpoint.
2019-12-12 10:37:19,295 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Removing cache directory /tmp/flink-web-f6be0c2d-5099-4bd6-bc72-a0ae1fc6448e/flink-web-ui
2019-12-12 10:37:19,299 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - http://flink-job-cluster:8081 lost leadership
2019-12-12 10:37:19,299 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Shut down complete.
2019-12-12 10:37:19,302 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Shutting StandaloneJobClusterEntryPoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
        at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Failed to find job JAR on class path. Please provide the job class name explicitly.
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:131)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:114)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
        ... 6 more
Caused by: java.util.NoSuchElementException: No JAR with manifest attribute for entry class
        at org.apache.flink.container.entrypoint.JarManifestParser.findOnlyEntryClass(JarManifestParser.java:80)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.scanClassPathForJobJar(ClassPathJobGraphRetriever.java:137)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:129)
        ... 11 more
.
2019-12-12 10:37:19,305 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:6124
2019-12-12 10:37:19,305 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting down BLOB cache
2019-12-12 10:37:19,315 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopping Akka RPC service.
2019-12-12 10:37:19,320 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2019-12-12 10:37:19,321 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2019-12-12 10:37:19,323 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-12 10:37:19,325 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-12 10:37:19,354 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
2019-12-12 10:37:19,356 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
2019-12-12 10:37:19,378 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopped Akka RPC service.
2019-12-12 10:37:19,382 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not start cluster entrypoint StandaloneJobClusterEntryPoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneJobClusterEntryPoint.
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
        at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
        ... 2 more
Caused by: org.apache.flink.util.FlinkException: Failed to find job JAR on class path. Please provide the job class name explicitly.
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:131)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:114)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
        ... 6 more
Caused by: java.util.NoSuchElementException: No JAR with manifest attribute for entry class
        at org.apache.flink.container.entrypoint.JarManifestParser.findOnlyEntryClass(JarManifestParser.java:80)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.scanClassPathForJobJar(ClassPathJobGraphRetriever.java:137)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.getJobClassNameOrScanClassPath(ClassPathJobGraphRetriever.java:129)
        ... 11 more
develk@ACIDLAELKV01:~/cntx_eng$

Now, I have added Job class name as in argument section of "job-cluster-job.yaml.template" file.

Like below:

args: ["job-cluster", 
               "--job-classname", "com.flink.wordCountSimple",
               "-Djobmanager.rpc.address=flink-job-cluster",

But after that I am getting below exception:

Caused by: org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.

Please see below detail log.

2019-12-13 19:08:34,323 INFO  org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint  - Shut down complete.
2019-12-13 19:08:34,329 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Shutting StandaloneJobClusterEntryPoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
        at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:119)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: com.flink.wordCountSimple
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:116)
        ... 10 more
.
2019-12-13 19:08:34,337 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:6124
2019-12-13 19:08:34,338 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting down BLOB cache
2019-12-13 19:08:34,364 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopping Akka RPC service.
2019-12-13 19:08:34,368 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2019-12-13 19:08:34,372 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-13 19:08:34,392 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2019-12-13 19:08:34,392 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2019-12-13 19:08:34,406 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
2019-12-13 19:08:34,410 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
2019-12-13 19:08:34,434 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopped Akka RPC service.
2019-12-13 19:08:34,443 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not start cluster entrypoint StandaloneJobClusterEntryPoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint StandaloneJobClusterEntryPoint.
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:535)
        at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:105)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:257)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:224)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:172)
        at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:171)
        ... 2 more
Caused by: org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:119)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.retrieveJobGraph(ClassPathJobGraphRetriever.java:96)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:62)
        at org.apache.flink.runtime.dispatcher.JobDispatcherFactory.createDispatcher(JobDispatcherFactory.java:41)
        at org.apache.flink.runtime.entrypoint.component.AbstractDispatcherResourceManagerComponentFactory.create(AbstractDispatcherResourceManagerComponentFactory.java:184)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: com.flink.wordCountSimple
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.flink.container.entrypoint.ClassPathJobGraphRetriever.createPackagedProgram(ClassPathJobGraphRetriever.java:116)
        ... 10 more
-- Atanu chatterjee
apache-flink
azure
azure-aks
devops
kubernetes

2 Answers

12/13/2019

There's a complete, working example of creating and running a flink job cluster on kubernetes in https://github.com/alpinegizmo/flink-containers-example. Maybe that will help. See also https://www.youtube.com/watch?v=ceZtUDgh2TE.

-- David Anderson
Source: StackOverflow

3/26/2020
version: "2.1"
services:
  jobmanager:
    build:
      context: ./
      args: 
        JAR_FILE: flink-event-tracker-bundled-1.6.0.jar
    image: test/flink-event-tracker
    expose:
      - "6123"
    ports:
      - "8081:8081"
      - "6123:6123"
    command: job-cluster --job-classname com.company.test.flink.pipelines.KafkaPipelineConsumer -Djobmanager.rpc.address=jobmanager --runner=FlinkRunner --streaming=true --checkpointingInterval=30000
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
      - JOB_MANAGER=jobmanager
    volumes:
      - data-volume:/docker/volumes

  taskmanager:
    image: test/flink-event-tracker
    expose:
      - "6121"
      - "6122"
    depends_on:
      - jobmanager
    command: task-manager -Djobmanager.rpc.address=jobmanager
    links:
      - "jobmanager:jobmanager"
    environment:
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
      - JOB_MANAGER=jobmanager
    volumes:
      - data-volume:/docker/volumes

volumes:
  data-volume:
    driver: local
    driver_opts:
        o: bind
        type: none
        device: /Users/home/Development/docker/volumes/flink

Docker file

FROM flink:1.9

ARG JAR_FILE=""

ENV APP_OPTS ""
ENV JAVA_OPTS ""
ENV JOB_MANAGER=""

# Build arg allows passing the version at runtime
ARG VERSION=unset-version

COPY flink-conf.yml $FLINK_HOME/conf/flink-conf.yaml

COPY  target/$JAR_FILE $FLINK_HOME/lib/event-tracker.jar

COPY docker-cluster-entrypoint.sh /docker-cluster-entrypoint.sh

RUN apt-get update && apt-get install procps -y && apt-get install curl -y
RUN echo "root:root" | chpasswd

RUN chmod 777 /docker-cluster-entrypoint.sh
RUN chmod 777 $FLINK_HOME/lib/event-tracker.jar

ENTRYPOINT [ "bash","/docker-cluster-entrypoint.sh" ]

docker-cluster-entrypoint.sh

FLINK_HOME=${FLINK_HOME:-"/opt/flink/bin"}

JOB_CLUSTER="job-cluster"
TASK_MANAGER="task-manager"

CMD="$1"
shift;

if [ "${CMD}" = "--help" -o "${CMD}" = "-h" ]; then
    echo "Usage: $(basename $0) (${JOB_CLUSTER}|${TASK_MANAGER})"
    exit 0
elif [ "${CMD}" = "${JOB_CLUSTER}" -o "${CMD}" = "${TASK_MANAGER}" ]; then
    echo "Starting the ${CMD}"

    if [ "${CMD}" = "${TASK_MANAGER}" ]; then
        exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@"
    else
        exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
    fi
fi

How to run:-

mvn clean install

docker-compose -f docker-compose.local.yml up --scale taskmanager=2 >  exceptionlog.log

docker-compose -f docker-compose.local.yml build

this is the entire conf that runs your docker. but if you want to run in kube, just convert the docker-compose file to its corresponding kube files...remaining can stay same.. may be do a helm that way kube maintenance is better.

Note:- we are using apache beam to code the job

-- aditya kumar
Source: StackOverflow