AKS: omsagent-win pods restarts again and again

10/11/2021

omsagent-win is the pod in the kube-system namespace that is supplied with aks included if you have azure insights enabled. I use a hybrid environment here. Win & Linux are used.

output: kubectl get nodes

NAME                            STATUS   ROLES   AGE     VERSION
aks-nplin-21116150-vmss000002   Ready    agent   2d21h   v1.21.2
aks-nplin-21116150-vmss000003   Ready    agent   2d21h   v1.21.2
aks-nplin-21116150-vmss000006   Ready    agent   2d21h   v1.21.2
aksnpwin000003                  Ready    agent   2d20h   v1.21.2
aksnpwin000004                  Ready    agent   2d20h   v1.21.2
aksnpwin000005                  Ready    agent   2d20h   v1.21.2

On a linux node everything works fine.

NAME                                    READY   STATUS    RESTARTS   AGE     IP             NODE                            NOMINATED NODE   READINESS GATES
omsagent-xscbv                          2/2     Running   0          2d21h   10.240.1.108   aks-nplin-21116150-vmss000006   <none>           <none>
omsagent-k2zlx                          2/2     Running   0          2d21h   10.240.0.137   aks-nplin-21116150-vmss000002   <none>           <none>
omsagent-pzd4s                          2/2     Running   0          2d21h   10.240.0.79    aks-nplin-21116150-vmss000003   <none>           <none>

But as soon as it goes to a windows node I have a restart all the time. NodeSelector was also checked.

NAME                                    READY   STATUS    RESTARTS   AGE     IP             NODE                            NOMINATED NODE   READINESS GATES
omsagent-win-2vwqd                      1/1     Running   283        2d20h   10.240.2.64    aksnpwin000005                  <none>           <none>
omsagent-win-5kz2h                      1/1     Running   73         2d20h   10.240.1.178   aksnpwin000003                  <none>           <none>
omsagent-win-gmwk6                      1/1     Running   25         2d20h   10.240.1.46    aksnpwin000004                  <none>           <none>

output: kubectl -n kube-system describe pod omsagent-win-2vwqd

Events:
  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Warning  Unhealthy  10m (x950 over 2d20h)  kubelet  Liveness probe failed:
  Normal   Killing    19s (x708 over 2d20h)  kubelet  Container omsagent-win failed liveness probe, will be restarted

I have already tried to give the pods more cpu and ram that worked at the beginning but after a while (about 30 minutes) they go back to their old original values.

Any ideas on how to examine this in a different way?

Thanks in advance!

-- Andreas Enti
azure-aks
kubernetes
kubernetes-pod

0 Answers