K8s deamon set high avialibality

2/26/2022

we have a deamonset and we want to make it HA (not our deamonset), does the following is applicable for HA for deamaon set also?

  • affinity (anti affinity)
  • toleration's
  • pdb

we have on each cluster 3 worker nodes I did it in the past for deployment but not sure what is also applicable for deamonset, this is not our app but we need to make sure it is HA as it's critical app

update

Does it make sense to add the following to deamonset, lets say I've 3 worker nodes and I want it to be scheduled only in foo workers nodes?

spec:
  tolerations:
    - effect: NoSchedule
      key: WorkGroup
      operator: Equal
      value: foo
    - effect: NoExecute
      key: WorkGroup
      operator: Equal
      value: foo
  nodeSelector:
    workpcloud.io/group: foo
-- Alberto
amazon-web-services
google-cloud-platform
high-availability
kubernetes

2 Answers

2/26/2022

You can not control the replicas in DaemonSet as DaemonSet will have one pod per node.

you need to change the object to either Deployment or Statefulset to manage the replica count and use the nodeSelector to deploy it in all the nodes.

-- Vijay Daswani
Source: StackOverflow

2/26/2022

You have asked two, somewhat unrelated questions.

does the following is applicable for HA for deamaon set also?

  • affinity (anti affinity)
  • toleration's
  • pdb

A daemonset (generally) runs on a policy of "one pod per node" -- you CAN'T make it HA (for example, by using autoscaling), and you will (assuming you use defaults) have as many replicas of the daemonset as you have nodes, unless you explicitly specify which nodes you want to want to run the daemonset pods, using things like nodeSelector and/or tolerations, in which case you will have less pods. The documentation page linked above gives more details and has some examples

this is not our app but we need to make sure it is HA as it's critical app

Are you asking how to make your critical app HA? I'm going to assume you are.

If the app is as critical as you say, then a few starter recommendations:

  1. Make sure you have at least 3 replicas (4 is a good starter number)
  2. Add tolerations if you must schedule those pods on a node pool that has taints
  3. Use node selectors as needed (e.g. for regions or zones, but only if necessary do to something like disks being present in those zones)
  4. Use affinity to group or spread your replicas. Definitely would recommend using a spread so that if one node goes down, the other replicas are still up
  5. Use a pod priority to indicate to the cluster that your pods are more important than other pods (beware this may cause issues if you set it too high)
  6. Setup notifications to something like PagerDuty, OpsGenie, etc, so you (or your ops team) are notified if the app goes down. If the app is critical, then you'll want to know it's down ASAP.
  7. Setup pod disruption budgets, and horizontal pod autoscalers to ensure an agreed number of pods are always up.
-- Blender Fox
Source: StackOverflow