I have had a running k8s cluster for 2 days and then it has started behaving strangely.
My specific question is on kube-proxy. kube-proxy is not updating iptables.
From kube-proxy logs, I can see it failed to connect to kubernetes-apiserver (in my case connection is kube-prxy --> Haproxy --> k8s API server). But the pod is shown as RUNNING.
Question: I am expecting kube-proxy pod to be down if it is not able to register with apiserver for events.
How do I achieve this behavior via liveness probes?
Note: After killing the pod, kube-proxy works fine.
sudo docker logs 1de375c94fd4 -f
W0910 15:18:22.091902 1 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0910 15:18:22.091962 1 feature_gate.go:226] feature gates: &{{} map[]}
time="2018-09-10T15:18:22Z" level=warning msg="Running modprobe ip_vs failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-33-generic/modules.dep.bin'\nmodprobe: WARNING: Module ip_vs not found in directory /lib/modules/4.15.0-33-generic`, error: exit status 1"
time="2018-09-10T15:18:22Z" level=error msg="Could not get ipvs family information from the kernel. It is possible that ipvs is not enabled in your kernel. Native loadbalancing will not work until this is fixed."
I0910 15:18:22.185086 1 server.go:409] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
I0910 15:18:22.186885 1 server_others.go:140] Using iptables Proxier.
W0910 15:18:22.438408 1 server.go:601] Failed to retrieve node info: nodes "$(node_name)" not found
W0910 15:18:22.438494 1 proxier.go:306] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
I0910 15:18:22.438595 1 server_others.go:174] Tearing down inactive rules.
I0910 15:18:22.861478 1 server.go:444] Version: v1.10.2
I0910 15:18:22.867003 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 2883584
I0910 15:18:22.867046 1 conntrack.go:52] Setting nf_conntrack_max to 2883584
I0910 15:18:22.867267 1 conntrack.go:83] Setting conntrack hashsize to 720896
I0910 15:18:22.893396 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0910 15:18:22.893505 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0910 15:18:22.893737 1 config.go:102] Starting endpoints config controller
I0910 15:18:22.893749 1 controller_utils.go:1019] Waiting for caches to sync for endpoints config controller
I0910 15:18:22.893742 1 config.go:202] Starting service config controller
I0910 15:18:22.893765 1 controller_utils.go:1019] Waiting for caches to sync for service config controller
I0910 15:18:22.993904 1 controller_utils.go:1026] Caches are synced for endpoints config controller
I0910 15:18:22.993921 1 controller_utils.go:1026] Caches are synced for service config controller
W0910 16:13:28.276082 1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: watch of *core.Endpoints ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Unexpected watch close - watch lasted less than a second and no items received
W0910 16:13:28.276083 1 reflector.go:341] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: watch of *core.Service ended with: very short watch: k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Unexpected watch close - watch lasted less than a second and no items received
E0910 16:13:29.276678 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:29.276677 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:30.277201 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:30.278009 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:31.277723 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:31.278574 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:32.278197 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:32.279134 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:33.278684 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Endpoints: Get https://127.0.0.1:6553/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
E0910 16:13:33.279587 1 reflector.go:205] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:86: Failed to list *core.Service: Get https://127.0.0.1:6553/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6553: getsockopt: connection refused
Question: I am expecting kube-proxy pod to be down if it is not able to register with apiserver for events.
The kube-proxy is not supposed to go down. It listens for events on the kube-apiserver and performs whatever it needs to do when a change/deployment happens. The rationale that I can think of is that it may be caching information to keep the iptables on your system consistent. Kubernetes is designed in such a way that if your master/kube-apiserver/or master components go down, then traffic should still be flowing to the nodes with no downtime.
How do I achieve this behavior via liveness probes?
You can always add liveness probes to the kube-proxy
DaemonSet but it's not a recommended practice:
spec:
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
image: k8s.gcr.io/kube-proxy-amd64:v1.11.2
imagePullPolicy: IfNotPresent
name: kube-proxy
resources: {}
securityContext:
privileged: true
livenessProbe:
exec:
command:
- curl <apiserver>:10256/healthz
initialDelaySeconds: 5
periodSeconds: 5
Make sure that --healthz-port
is enabled on the kube-apiserver.