How can I maintain a set of unique number crunching containers in kubernetes?

6/20/2019

I want to run a "set" of containers in kubernetes, each which only differs in the docker environment variables (each one searches it's own dataset, which is located on network storage, then cached into the container's ram). For example:

  • container 1 -> Dataset 1
  • container 2 -> Dataset 2

Over time, I'll want to add (and sometimes remove) containers from this "set", but don't want to restart ALL of the containers when doing so.

From my (naive) knowledge of kubernetes, the only way I can see to do this is:

  • Each container could be its own deployment -- However there are thousands of containers, so would be a pain to modify and manage.

So my questions are:

  1. Can I use a StatefulSet to manage this?

    1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged? 1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?

  2. Is there any kubernetes concept to "group" deployments into some logical unit?

  3. Any other thoughts about how to implement this in kubernetes?
  4. Would docker swarm (or another container management platform) be better suited to my use case?
-- intensity
kubernetes
kubernetes-deployment
kubernetes-pod
kubernetes-statefulset

2 Answers

7/9/2019

If you expect your containers to eventually be done with their workload and terminate (as opposed to processing a single item loaded in RAM forever), you should use a job queue such as Celery on top of Kubernetes to manage the execution. In this case Celery will do all the orchestration, including restarting jobs if they fail. This is much more manageable than using Kubernetes directly.

Kubernetes even provides an official example of such a setup.

-- Shnatsel
Source: StackOverflow

7/3/2019

According to your description, the StatefulSet it's what you need.

1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged?

You can choose a proper update strategy. I suggest RollingUpdate but you can try whatever suits you.

Also check out this tutorial.

1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?

Yes, because their naming is consistent (name-0, name-1, name-2, etc). You can use hostname (pod name) index with that.

Please let me know if that helped.

-- OhHiMark
Source: StackOverflow