Often times, when I want to check out what's wrong with Pod
s that go to a state of CrashLoopBackoff
or Error
, I do the following. I change the pod command to sleep 10000
and run kubectl exec -ti POD_NAME bash
in my terminal to further inspect the environment and code. The problem is that it terminates very soon and without exception. It has been quite annoying to inspect the content of my pod.
My config
The result of kubectl version
:
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-15T15:50:38Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.7", GitCommit:"6c143d35bb11d74970e7bc0b6c45b6bfdffc0bd4", GitTreeState:"clean", BuildDate:"2019-12-11T12:34:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
The result of helm version
:
version.BuildInfo{Version:"v3.0.1", GitCommit:"7c22ef9ce89e0ebeb7125ba2ebf7d421f3e82ffa", GitTreeState:"clean", GoVersion:"go1.13.4"}
OS: MacOS Catalina 10.15.2
Docker
version: 19.03.5
I run my stuff using helm
and helmfile
, and my releases usually include a Deployment
and a Service
.
Let me know if any additional info can help.
Any help is appreciated!
you can do something like this :
kubectl exec -it --request-timeout=500s POD_NAME bash
Try to install Golang in version 1.13.4+. You have go1.12.12 version of kubectl server which casues a lot of problems with compatibility. So you have to update it. If you are upgrading from an older version of Go you must first remove the existing version. Take a look here: upgrading-golang.
Apply changes in your pod definition file, add following lines under container definition:
#Just spin & wait forever
command: [ "/bin/bash", "-c", "--" ]
args: [ "trap : TERM INT; sleep infinity & wait" ]
This will keep your container alive until it is told to stop. Using trap and wait will make your container react immediately to a stop request. Without trap/wait stopping will take a few seconds.
If you think it is networking problem use a tcpdump.
Tcpdump is a tool to that captures network traffic and helps you troubleshoot some common networking problems. Here is a quick way to capture traffic on the host to the target container with IP 172.28.21.3.
We are going to join the one container and will be trying to reach out another container:
kubectl exec -ti testbox-2460950909-5wdr4 -- /bin/bash
$ curl http://ip:port
On the host with a container we are going to capture traffic related to container target IP:
$ tcpdump -i any host ip
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:15:59.903566 IP 172.28.128.132.60358 > 172.28.21.3.5000: Flags [S], seq 3042274422, win 28200, options [mss 1410,sackOK,TS val 10056152 ecr 0,nop,wscale 7], length 0
20:15:59.903566 IP 172.28.128.132.60358 > 172.28.21.3.5000: Flags [S], seq 3042274422, win 28200, options [mss 1410,sackOK,TS val 10056152 ecr 0,nop,wscale 7], length 0
20:15:59.905481 ARP, Request who-has 172.28.21.3 tell 10.244.27.0, length 28
20:16:00.907463 ARP, Request who-has 172.28.21.3 tell 10.244.27.0, length 28
20:16:01.909440 ARP, Request who-has 172.28.21.3 tell 10.244.27.0, length 28
20:16:02.911774 IP 172.28.128.132.60358 > 172.28.21.3.5000: Flags [S], seq 3042274422, win 28200, options [mss 1410,sackOK,TS val 10059160 ecr 0,nop,wscale 7], length 0
20:16:02.911774 IP 172.28.128.132.60358 > 172.28.21.3.5000: Flags [S], seq 3042274422, win 28200, options [mss 1410,sackOK,TS val 10059160 ecr 0,nop,wscale 7], length 0
As you see there is a trouble on the wire as kernel fails to route the packets to the target IP.
You can also debug pod using kubectl logs
command:
Running kubectl logs -p
will fetch logs from existing resources at API level. This means that terminated pods' logs will be unavailable using this command.
The best way is to have your logs centralized via logging agents or directly pushing these logs into an external service.
Alternatively and given the logging architecture in Kubernetes, you might be able to fetch the logs directly from the log-rotate files in the node hosting the pods. However, this option might depend on the Kubernetes implementation as log files might be deleted when the pod eviction is triggered.
Take a look here: pod-debugging.
Take a look on official documentation: kubectl-exec.