I'm trying to set up a sample Hadoop cluster on Openshift/Kuberentes/Docker (Openshift 3.5), and i've run into the following issue:
Only one Datanode gets registered on the Namenode at a time, because Namenode sees all datanodes under the same IP (192.168.20.1). This is apparently due to a network route in the cluster
Actual sample configuration:
Namenode
192.168.20.119 hadoop-namenode-10-qp83z
Datanodes
192.168.20.132 hadoop-slave-0.hadoop-slave.my-project.svc.cluster.local hadoop-slave-0
192.168.20.133 hadoop-slave-1.hadoop-slave.my-project.svc.cluster.local hadoop-slave-1
192.168.20.134 hadoop-slave-2.hadoop-slave.my-project.svc.cluster.local hadoop-slave-2
Namenode log:
17/12/05 22:11:21 INFO net.NetworkTopology: Removing a node: /default-rack/192.168.20.1:50010
17/12/05 22:11:21 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.20.1:50010
17/12/05 22:11:21 INFO blockmanagement.BlockReportLeaseManager: Registered DN f3c22144-f9cf-47dc-b0b7-bf946121ee81 (192.168.20.1:50010).
17/12/05 22:11:21 INFO blockmanagement.DatanodeDescriptor: Adding new storage ID DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c for DN 192.168.20.1:50010
17/12/05 22:11:21 INFO BlockStateChange: BLOCK* processReport 0x9c1289bc1f9f766f: Processing first storage report for DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c from datanode f3c22144-f9cf-47dc-b0b7-bf946121ee81
17/12/05 22:11:21 INFO BlockStateChange: BLOCK* processReport 0x9c1289bc1f9f766f: from storage DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c node DatanodeRegistration(192.168.20.1, datanodeUuid=f3c22144-f9cf-47dc-b0b7-bf946121ee81, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6b84af8f-fe9a-465a-840e-6acb0fe5f8d9;nsid=399770301;c=0), blocks: 0, hasStaleStorage: false, processing time: 0 msecs, invalidatedBlocks: 0
17/12/05 22:11:21 INFO hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(192.168.20.1, datanodeUuid=2bd926b9-b00e-4eb6-858d-3e90fa6b3ef8, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6b84af8f-fe9a-465a-840e-6acb0fe5f8d9;nsid=399770301;c=0) storage 2bd926b9-b00e-4eb6-858d-3e90fa6b3ef8
17/12/05 22:11:21 INFO namenode.NameNode: BLOCK* registerDatanode: 192.168.20.1:50010
Configuration (hdfs-site.xml):
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value> <!-- same result with false -->
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value> <!-- same result with false -->
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
Output of ip route on all pods:
ip route
default via 192.168.20.1 dev eth0
192.168.0.0/16 dev eth0
192.168.20.0/24 dev eth0 proto kernel scope link src 192.168.20.134
224.0.0.0/4 dev eth0
The issue is strikingly similar to issue described in Why is Dockerized Hadoop datanode registering with the wrong IP address?, but now in context of Kubernetes cluster
Any ideas?
Does this help?
"Famous last words Before you scale down the datanode StatefulSet, you need to tell Hadoop that one datanode will go away ;)"
See http://b4mad.net/datenbrei/openshift/hadoop-hdfs/ See also https://gitlab.com/goern/hdfs-openshift