H2O in Kubernetes

3/30/2017

Has anyone managed to run a H2O Cluster in Kubernetes?

I tried 2 options both using flatfile 1) using StatefulSet, but since the ip generated for the pod can change the cluster is unreliable 2) using a bunch of pairs of service/deployments and specifying the the flatfile the dns name of the service but the cluster doesn't start up correctly

none of the above work. Is there any way to make it work?

-- Alessandro Magnani
h2o
kubernetes

1 Answer

4/8/2017

If multicast packets can be transmitted between the pods, then you could rely on that for the cluster formation. Just specify a unique -name for all the nodes to share. This is easy if it works, with no code changes.

UPDATE (2018/04/21) -- one of my colleagues says:

I used weave as the network layer, what that does is provide a connection between all the containers for that kubernetes pod group, then you dont need to use the flatfile in H2O, as h2o will multicast on startup, weave will take the multicast and send it to all instances of the pod.

in K8s run this: kubectl apply --filename https://git.io/weave-kube-1.6


If multicast is not an option, there isn't an out-of-the-box solution today for Kubernetes that I'm aware of.

You will need an orchestrator to distribute the flatfile information.

There are at least three examples of code to do this for other environments in the H2O github repos.

  1. ec2 scripts

https://github.com/h2oai/h2o-3/tree/master/ec2

  1. The hadoop driver

https://github.com/h2oai/h2o-3/blob/master/h2o-hadoop/h2o-mapreduce-generic/src/main/java/water/hadoop/h2omapper.java

In particular, look at how this class gets overridden:

https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/water/init/AbstractEmbeddedH2OConfig.java

  1. The sparkling water driver in the sparkling water repo.
-- TomKraljevic
Source: StackOverflow