I can't get through to Hadoop server from Hadoop client

11/5/2018

Hadoop server is in Kubernetes. And the Hadoop client is located on an external network. So I try to use the Hadoop server using a kubernetes-service. But hadoop fs -put does not work for the Hadoop client. As I know, the namenode gives the datanode IP to Hadoop client. If yes, where does the namenode get IP from?

-- K.k
hadoop
kubernetes

2 Answers

11/5/2018

You can check my other answer. HDFS is not production ready in K8s yet (as of this writing)

The namenode gives the client the IP addresses of the datanodes and it knows those when they join the cluster as shown below:

datanodes

The issue in K8s is that you have to expose each data node as a service or external IP, but the namenode sees the datanodes with their pod IP addresses that are not available to the outside world. Also, HDFS doesn't provide a publish IP for each datanode config where you could force to use a service IP, so you'll have to do fancy custom networking or your client has to be inside the podCidr (Which kind of defeats the purpose of HDFS being a distributed filesystem).

-- Rico
Source: StackOverflow

11/5/2018

If you need IP node where running pod, can usage ENV:

apiVersion: v1
kind: Pod
metadata:
  name: get-host-ip
spec:
  containers:
    - name: test-container
      image: k8s.gcr.io/busybox
      command: [ "sh", "-c"]
      args:
      - while true; do
          printenv HOST_IP;
        done;
      env:
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
  restartPolicy: Never

API docs: PodStatus v1 core

-- Arslanbekov
Source: StackOverflow