Hadoop datanode routing issue on Kubernetes

12/5/2017

I'm trying to set up a sample Hadoop cluster on Openshift/Kuberentes/Docker (Openshift 3.5), and i've run into the following issue:

Only one Datanode gets registered on the Namenode at a time, because Namenode sees all datanodes under the same IP (192.168.20.1). This is apparently due to a network route in the cluster

Actual sample configuration:

Namenode

192.168.20.119  hadoop-namenode-10-qp83z

Datanodes

192.168.20.132  hadoop-slave-0.hadoop-slave.my-project.svc.cluster.local  hadoop-slave-0
192.168.20.133  hadoop-slave-1.hadoop-slave.my-project.svc.cluster.local  hadoop-slave-1
192.168.20.134  hadoop-slave-2.hadoop-slave.my-project.svc.cluster.local  hadoop-slave-2

Namenode log:

17/12/05 22:11:21 INFO net.NetworkTopology: Removing a node: /default-rack/192.168.20.1:50010
17/12/05 22:11:21 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.20.1:50010
17/12/05 22:11:21 INFO blockmanagement.BlockReportLeaseManager: Registered DN f3c22144-f9cf-47dc-b0b7-bf946121ee81 (192.168.20.1:50010).
17/12/05 22:11:21 INFO blockmanagement.DatanodeDescriptor: Adding new storage ID DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c for DN 192.168.20.1:50010
17/12/05 22:11:21 INFO BlockStateChange: BLOCK* processReport 0x9c1289bc1f9f766f: Processing first storage report for DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c from datanode f3c22144-f9cf-47dc-b0b7-bf946121ee81
17/12/05 22:11:21 INFO BlockStateChange: BLOCK* processReport 0x9c1289bc1f9f766f: from storage DS-6f7b2565-1e85-491a-ab04-69a7ffa25d5c node DatanodeRegistration(192.168.20.1, datanodeUuid=f3c22144-f9cf-47dc-b0b7-bf946121ee81, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6b84af8f-fe9a-465a-840e-6acb0fe5f8d9;nsid=399770301;c=0), blocks: 0, hasStaleStorage: false, processing time: 0 msecs, invalidatedBlocks: 0
17/12/05 22:11:21 INFO hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(192.168.20.1, datanodeUuid=2bd926b9-b00e-4eb6-858d-3e90fa6b3ef8, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6b84af8f-fe9a-465a-840e-6acb0fe5f8d9;nsid=399770301;c=0) storage 2bd926b9-b00e-4eb6-858d-3e90fa6b3ef8
17/12/05 22:11:21 INFO namenode.NameNode: BLOCK* registerDatanode: 192.168.20.1:50010

Configuration (hdfs-site.xml):

    <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>true</value> <!-- same result with false -->
    </property>   
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value> <!-- same result with false -->
    </property>

    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>

Output of ip route on all pods:

ip route                                                                                                                                                                                                                                     
default via 192.168.20.1 dev eth0                                                                                                                                                                                                                    
192.168.0.0/16 dev eth0                                                                                                                                                                                                                              
192.168.20.0/24 dev eth0 proto kernel scope link src 192.168.20.134                                                                                                                                                                                  
224.0.0.0/4 dev eth0

The issue is strikingly similar to issue described in Why is Dockerized Hadoop datanode registering with the wrong IP address?, but now in context of Kubernetes cluster

Any ideas?

-- Nikolay Voskresensky
hadoop
kubernetes

1 Answer

9/13/2018

Does this help?

"Famous last words Before you scale down the datanode StatefulSet, you need to tell Hadoop that one datanode will go away ;)"

See http://b4mad.net/datenbrei/openshift/hadoop-hdfs/ See also https://gitlab.com/goern/hdfs-openshift

-- jseteny
Source: StackOverflow