K
Q

Question

How can I deploy HDFS (Hadoop Distributed FS) to a K8s (Kubernetes) cluster?

5/11/2020

So I have a K8s cluster up and running and I want to run Spark jobs on top of it.

Kubernetes is v1.15.3 and Spark v2.4.5.

Now for data storage I am thinking of using HDFS but I do not want to install the entire Hadoop library which includes YARN and MapReduce (pls correct me if I am wrong).

I have seen this repository as the only direct solution available online but it's not working for me currently.

When I try to deploy it same as how it's mentioned in the ReadMe on the repo, I see that multiple pods are created and as soon as all of them go to Running state, the my-hdfs-namenode-0 pod goes into Error state and a lot of pods start crashing.

This is the error I get in the log from kubectl logs pod/my-hdfs-namenode-0:

20/05/11 09:47:57 ERROR namenode.NameNode: Failed to start namenode.
java.lang.IllegalArgumentException: Unable to construct journal, qjournal://my-hdfs-journalnode-1.my-hdfs-journalnode.default.svc.cluster.local:8485;my-hdfs-journalnode-2.my-hdfs-journalnode.default.svc.cluster.local:8485;my-hdfs-journalnode-0.my-hdfs-journalnode.default.svc.cluster.local:8485/hdfs-k8s
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1638)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:282)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:247)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:985)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1636)
    ... 5 more
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:178)
    at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:156)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:367)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:116)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
    ... 10 more

I'm guessing it's a name resolution (DNS) related error? Here is the complete log for reference.

Now, this repository isn't actively maintained so if someone can suggest either how I can resolve this error or another way how I can deploy HDFS to my Kubernetes cluster.

-- Dr. Ameto

apache-spark

hadoop

hdfs

kubernetes

1 Answer

5/11/2020

In general, I suggest you don't use HDFS within k8s...

NameNode HA would need to be containerized, and NameNode filesystem must be stateful.
You need Zookeeper QJM, which competes with etcd, in a way, for leader election purposes.

HDFS was designed before k8s persistent volumes were really thought about. Hadoop Ozone project is still in development and meant to work around these limitations.

In the meanwhile, I suggest you look into using MinIO, or Project Rook (on CephFS), both of which offer a Hadoop-compatible file system (HCFS)

If you must use HDFS, then set it up outside k8s, then make requests to it from within the containers.

Regarding YARN, make sure to watch the Yunikorn project (YARN on k8s)

-- OneCricketeer

Source: StackOverflow

KQ

How can I deploy HDFS (Hadoop Distributed FS) to a K8s (Kubernetes) cluster?

Similar Questions

1 Answer

K
Q