Hadoop server is in Kubernetes. And the Hadoop client is located on an external network. So I try to use the Hadoop server using a kubernetes-service. But hadoop fs -put
does not work for the Hadoop client. As I know, the namenode gives the datanode IP to Hadoop client. If yes, where does the namenode get IP from?
You can check my other answer. HDFS is not production ready in K8s yet (as of this writing)
The namenode gives the client the IP addresses of the datanodes and it knows those when they join the cluster as shown below:
The issue in K8s is that you have to expose each data node as a service or external IP, but the namenode sees the datanodes with their pod IP addresses that are not available to the outside world. Also, HDFS doesn't provide a publish IP for each datanode config where you could force to use a service IP, so you'll have to do fancy custom networking or your client has to be inside the podCidr (Which kind of defeats the purpose of HDFS being a distributed filesystem).
If you need IP node where running pod, can usage ENV:
apiVersion: v1
kind: Pod
metadata:
name: get-host-ip
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command: [ "sh", "-c"]
args:
- while true; do
printenv HOST_IP;
done;
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
restartPolicy: Never
API docs: PodStatus v1 core