I understood that
StatefulSet
- manages/maintains stable hostname, network ID and persistent storage.HeadlessService
- stable network ID you need to define a headless service for stateful applicationsFROM K8s Docs -> Sometimes you don’t need or want load-balancing and a single service IP. In this case, you can create “headless” services by specifying "None" for the cluster IP (.spec.clusterIP).
My thoughts on "Statefull vs Stateless" Apps/components
UI
comes under stateless application/component because, it doesn't maintain any data. But it gets from DB and displays
DB
, Cache
(Redis) are Statefull application/components,because it has to maintain data
My Questions.
Persistence storage in Apps
- Why should I consider to deploy postgress (for example) as StatefulSet
? I can define PV
s and PVC
in Deployement
to store the data in PV. Even if pods restart, it will gets it PV, thus there is no lose of data.
Network
- Redis(for example) should deploy as StatefulSet
, so that we can get unique "Network ID"/Name everytime even after restart of pods. For example; Redis-0
, Redis-1
are in StatefulSet
, I can define Redis-0
as master, so master name
never changes. Now why should I consider Headless Service
for StatefulSet
apps? I can directly access/connect the PODs itself, right? What is the use of Headless Service
?
I heard about Operators
, a best way to manage StatefulSet
apps. I found some example below. Why those(or some other) are important to deploy as StatefulSet
. For example, Prometheus
or ElasticSearch
; I can define PVs
and PVC
to store data without lose.
Why/When should I care about StatefulSet
and Headless Serivice
?
Before trying to answer some of your questions I must add disclaimer: there are different ways to skin a cat. And since we are discussing StatefulSets here note that not all approaches are best suited for all stateful applications. In case that you need a single database pod with single PV, you could have one approach, if your api pod needs some shared and some separated PV then another and so on..
Persistence storage in Apps - Why should I consider to deploy postgress (for example) as StatefulSet? I can define PVs and PVC in Deployement to store the data in PV.
This hold true if all your pods are using same persistent volume claim across all replicas (and provisioner allows that). If you try to increase number of replicas based on Deployment all your pods will use the very same PVC. On the other hand, StatefulSet as defined in api documentation has volumeClaimTemplates
allowing for each replica to have own generated PVC, securing separately provisioned PV for each pod in replica set.
Now why should I consider Headless Service for StatefulSet apps?
Because of ease of discovery. Again, you don't need to know how many replicas you have in Headless Service, checking service DNS you will get ALL replicas (caveat - that are up and running in that moment). You can do it manually, but in this case you rely on different mechanism of counting/keeping tabs on replicas (replicas are self registered to master for example). Here is nice example of pod discovery with nslookup that can shed some light on why headless can be a nice idea.
Why those(or some other) are important to deploy as StatefulSet
To my understanding, very Operators you listed are deployed using the Deployment themselves. They handle StatefulSets though, so lets consider ElasticSearch for example. If it was not deployed as StatefulSet you would end up with two pods targeting same PV (if provisioner allows it) and that would heavily mess up things. With StatefulSet each pod gets its very own persistent volume claim (from template) and consequently separate persistent volume from other ElasticSearch pods in same StatefulSet. This is just a tip of the iceberg since ElasticSearch is more complex for setup/handling and operators are helping with that.
Why/When should I care about StatefulSet and Headless Serivice?
Stateful set you should use in any case where replicated pods need to have separate PV from each other (created from PVC template, and automatically provisioned).
Headless Service you should use in any case where you want to automatically discover all pods under the service as opposed to regular Service where you get ClusterIP instead. As an illustration from above mentioned example here is difference between DNS entries for Service (with ClusterIP) and Headless Service (without ClusterIP):
Standard service - you will get the clusterIP value:
kubectl exec zookeeper-0 -- nslookup zookeeper
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: zookeeper.default.svc.cluster.local
Address: 10.0.0.213
Headless service - you will get the IP of each Pod:
kubectl exec zookeeper-0 -- nslookup zookeeper
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: zookeeper.default.svc.cluster.local
Address: 172.17.0.6
Name: zookeeper.default.svc.cluster.local
Address: 172.17.0.7
Name: zookeeper.default.svc.cluster.local
Address: 172.17.0.8