Flink 1.7.0 Dashboard not show Task Statistics

4/24/2019

I use Flink 1.7 dashboard and select a streaming job. This should show me some metrics, but it remains to load.

I deployed the same job in a Flink 1.5 cluster, and I can watch the metrics. Flink is running in docker swarm, but if I run Flink 1.7 in docker-compose (not in the swarm), it works

flink 1.7 dashboard

I can do it work, deleting the hostname in docker-compose.yaml file

version: "3"
services:
  jobmanager17:
    image: flink:1.7.0-hadoop27-scala_2.11
    hostname: "{{.Node.Hostname}}"
    ports:
      - "8081:8081"
      - "9254:9249"
    command: jobmanager
....

I delete the host name:

version: "3"
services:
  jobmanager17:
    image: flink:1.7.0-hadoop27-scala_2.11
    ports:
      - "8081:8081"
      - "9254:9249"
    command: jobmanager
....

and now the metrics works, but without the hostname...

Is it possible to have both?

PD: I read something about 'detached mode'... but I don't use it

-- Antonio Miranda
apache-flink
docker
flink-streaming
kubernetes

1 Answer

4/25/2019

I guess you are running your cluster on Kubernetes or docker swarm. With Flink 1.7 on Kubernetes you need to make sure the task managers are registering to the job manager with their IP addresses and not the hostnames. If you look at the jobmanagers log you'll find a lot of warnings that the Taskmanager can't be reached.

You can do that by passing defining the taskmanager.host parameter. An example depoyment might look like this:

apiVersion: extensions/v1beta1
kind: Deployment
....
spec:
  template:
    spec:
      containers:
      - name: "<%= name %>"
        args: ["taskmanager", "-Dtaskmanager.host=$(K8S_POD_IP)"]
        env:
          - name: K8S_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP

If you are not running on K8 it might be worth a try to pass this parameter manually (by providing an IP adress which is reachable from the jobmanager as the taskmanager.host)

Hope that helps.


Update: Flink 1.8 solves the problem. The property taskmanager.network.bind-policy is by default set to "ip" which does more or less the same what the above described workaround does (https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager)

-- TobiSH
Source: StackOverflow