AWS Kubernetes cluster using KOPS - Kube-dns and Kube-proxy goes down

6/5/2018

I have created a kubernetes cluster using KOPS on AWS cloud. The cluster gets created without any issues and runs fine for 10-15 hrs. I have deployed SAP Vora2.1 on this cluster. However generally after 12-15 hrs the KOPS cluster gets into problems related to kube-proxy and kube-dns. These pods either goes down or shows in a completed state. There is lot of restart as well. This eventually results into my application pods getting into problems and application also goes down. the application uses consul for service discovery however as kubernetes foundation services are not working properly so application does not comes to steady state even if I try to restore kube-proxy/kube-dns pods.

This is a 3 node cluster (1 master and 2 nodes) set up in a fully autoscaling mode. The overlay network is using default kubenet. Below is snapshot of pod statuses once system runs into issues,

[root@ip-172-31-18-162 ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                                                       READY     STATUS                                                 RESTARTS   AGE
infyvora      vora-catalog-1549734119-cfnhz                                              0/2       CrashLoopBackOff                                       188        20h
infyvora      vora-consul-0                                                              0/1       CrashLoopBackOff                                       101        20h
infyvora      vora-consul-1                                                              1/1       Running                                                34         20h
infyvora      vora-consul-2                                                              0/1       CrashLoopBackOff                                       95         20h
infyvora      vora-deployment-operator-293895365-4b3t6                                   0/1       Completed                                              104        20h
infyvora      vora-disk-0                                                                1/2       CrashLoopBackOff                                       187        20h
infyvora      vora-dlog-0                                                                0/2       CrashLoopBackOff                                       226        20h
infyvora      vora-dlog-1                                                                1/2       CrashLoopBackOff                                       155        20h
infyvora      vora-doc-store-2451237348-dkrm6                                            0/2       CrashLoopBackOff                                       229        20h
infyvora      vora-elasticsearch-logging-v1-444540252-mwfrz                              0/1       CrashLoopBackOff                                       100        20h
infyvora      vora-elasticsearch-logging-v1-444540252-vrr63                              1/1       Running                                                14         20h
infyvora      vora-elasticsearch-retention-policy-137762458-ns5pc                        1/1       Running                                                13         20h
infyvora      vora-fluentd-kubernetes-v1.21-9f4pt                                        1/1       Running                                                12         20h
infyvora      vora-fluentd-kubernetes-v1.21-s2t1j                                        0/1       CrashLoopBackOff                                       99         20h
infyvora      vora-grafana-2929546178-vrf5h                                              1/1       Running                                                13         20h
infyvora      vora-graph-435594712-47lcg                                                 0/2       CrashLoopBackOff                                       157        20h
infyvora      vora-kibana-logging-3693794794-2qn86                                       0/1       CrashLoopBackOff                                       99         20h
infyvora      vora-landscape-2532068267-w1f5n                                            0/2       CrashLoopBackOff                                       232        20h
infyvora      vora-nats-streaming-1569990702-kcl1v                                       1/1       Running                                                13         20h
infyvora      vora-prometheus-node-exporter-k4c3g                                        0/1       CrashLoopBackOff                                       102        20h
infyvora      vora-prometheus-node-exporter-xp511                                        1/1       Running                                                13         20h
infyvora      vora-prometheus-pushgateway-399610745-tcfk7                                0/1       CrashLoopBackOff                                       103        20h
infyvora      vora-prometheus-server-3955170982-xpct0                                    2/2       Running                                                24         20h
infyvora      vora-relational-376953862-w39tc                                            0/2       CrashLoopBackOff                                       237        20h
infyvora      vora-security-operator-2514524099-7ld0k                                    0/1       CrashLoopBackOff                                       103        20h
infyvora      vora-thriftserver-409431919-8c1x9                                          2/2       Running                                                28         20h
infyvora      vora-time-series-1188816986-f2fbq                                          1/2       CrashLoopBackOff                                       184        20h
infyvora      vora-tools5tlpt-100252330-mrr9k                                            0/1       rpc error: code = 4 desc = context deadline exceeded   272        17h
infyvora      vora-tools6zr3m-3592177467-n7sxd                                           0/1       Completed                                              1          20h
infyvora      vora-tx-broker-4168728922-hf8jz                                            0/2       CrashLoopBackOff                                       151        20h
infyvora      vora-tx-coordinator-3910571185-l0r4n                                       0/2       CrashLoopBackOff                                       184        20h
infyvora      vora-tx-lock-manager-2734670982-bn7kk                                      0/2       Completed                                              228        20h
infyvora      vsystem-1230763370-5ckr0                                                   0/1       CrashLoopBackOff                                       115        20h
infyvora      vsystem-auth-1068224543-0g59w                                              0/1       CrashLoopBackOff                                       102        20h
infyvora      vsystem-vrep-1427606801-zprlr                                              0/1       CrashLoopBackOff                                       121        20h
kube-system   dns-controller-3110272648-chwrs                                            1/1       Running                                                0          22h
kube-system   etcd-server-events-ip-172-31-64-102.ap-southeast-1.compute.internal        1/1       Running                                                0          22h
kube-system   etcd-server-ip-172-31-64-102.ap-southeast-1.compute.internal               1/1       Running                                                0          22h
kube-system   kube-apiserver-ip-172-31-64-102.ap-southeast-1.compute.internal            1/1       Running                                                0          22h
kube-system   kube-controller-manager-ip-172-31-64-102.ap-southeast-1.compute.internal   1/1       Running                                                0          22h
kube-system   kube-dns-1311260920-cm1fs                                                  0/3       Completed                                              309        22h
kube-system   kube-dns-1311260920-hm5zd                                                  3/3       Running                                                39         22h
kube-system   kube-dns-autoscaler-1818915203-wmztj                                       1/1       Running                                                12         22h
kube-system   kube-proxy-ip-172-31-64-102.ap-southeast-1.compute.internal                1/1       Running                                                0          22h
kube-system   kube-proxy-ip-172-31-64-110.ap-southeast-1.compute.internal                0/1       CrashLoopBackOff                                       98         22h
kube-system   kube-proxy-ip-172-31-64-15.ap-southeast-1.compute.internal                 1/1       Running                                                13         22h
kube-system   kube-scheduler-ip-172-31-64-102.ap-southeast-1.compute.internal            1/1       Running                                                0          22h
kube-system   tiller-deploy-352283156-97hhb                                              1/1       Running                                                34         22h

Has anyone come across similar issue related to KOPS kubernetes on AWS. Appreciate if any pointers to solve this issue.

Regards, Deepak

-- Deepak Jadhav
kops
kube-dns
kube-proxy
kubernetes

0 Answers