I want to run a "set" of containers in kubernetes, each which only differs in the docker environment variables (each one searches it's own dataset, which is located on network storage, then cached into the container's ram). For example:
Over time, I'll want to add (and sometimes remove) containers from this "set", but don't want to restart ALL of the containers when doing so.
From my (naive) knowledge of kubernetes, the only way I can see to do this is:
So my questions are:
Can I use a StatefulSet to manage this?
1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged? 1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?
Is there any kubernetes concept to "group" deployments into some logical unit?
If you expect your containers to eventually be done with their workload and terminate (as opposed to processing a single item loaded in RAM forever), you should use a job queue such as Celery on top of Kubernetes to manage the execution. In this case Celery will do all the orchestration, including restarting jobs if they fail. This is much more manageable than using Kubernetes directly.
Kubernetes even provides an official example of such a setup.
According to your description, the StatefulSet it's what you need.
1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged?
You can choose a proper update strategy. I suggest RollingUpdate
but you can try whatever suits you.
Also check out this tutorial.
1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?
Yes, because their naming is consistent (name-0
, name-1
, name-2
, etc). You can use hostname (pod name) index with that.
Please let me know if that helped.