I am getting CrashLoopBackOff Error for 1/4 pods, please guide me on how to troubleshoot this issue.
$kubectl get pod -n cog-prod01 -o wide
slotmachine-1688723297-5vlht 1/1 Running 0 21h 100.96.6.15 ip-172-21-61-42.compute.internal
slotmachine-1688723297-6plr9 1/1 Running 0 16h 100.96.13.16 ip-172-21-54-247.compute.internal
slotmachine-1688723297-k995t 1/1 Running 0 16h 100.96.11.186 ip-172-21-60-180.compute.internal
slotmachine-1688723297-sk8bn 0/1 CrashLoopBackOff 8 19m 100.96.2.72 ip-172-21-56-148.compute.internal
Kubelet logs on the node:
admin@ip-172-21-56-148:~$ journalctl -u kubelet -f
Jan 07 02:44:36 ip-172-21-56-148 kubelet[1568]: W0107 02:44:36.351880 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: W0107 02:44:46.372270 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443776 1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443851 1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.592800 1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerStarted", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
Jan 07 02:44:56 ip-172-21-56-148 kubelet[1568]: W0107 02:44:56.409374 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.669027 1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerDied", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971547 1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3aa.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971640 1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971770 1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: E0107 02:45:00.971805 1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:06 ip-172-21-56-148 kubelet[1568]: W0107 02:45:06.447068 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.149685 1568 status_manager.go:418] Status for pod "2bc8665e-30f5-11ea-a92d-024aeca0bafc" is up-to-date; skipping
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.443951 1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b35a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444070 1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444198 1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: E0107 02:45:12.444238 1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
Jan 07 02:45:13 ip-172-21-56-148 kubelet[1568]: I0107 02:45:13.938976 1568 qos_container_manager_linux.go:286] [ContainerManager]: Updated QoS cgroup configuration
Jan 07 02:45:16 ip-172-21-56-148 kubelet[1568]: W0107 02:45:16.464693 1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
admin@ip-172-21-43-86:~$ kubectl describe po -n cog-prod01 slotmachine-1688723297-sk8bn
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
27m 27m 1 default-scheduler Normal Scheduled Successfully assigned slotmachine-1688723297-sk8bn to ip-172-21-56-148.compute.internal
27m 27m 1 kubelet, ip-172-21-56-148.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "slotmachine-logs"
27m 27m 1 kubelet, ip-172-21-56-148.compute.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-9bxjf"
27m 4m 10 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Normal Pulled Container image "gt/slotmachine:develop.6590.xxxx.2866" already present on machine
27m 4m 10 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Normal Created Created container
27m 4m 10 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Normal Started Started container
27m 11s 113 kubelet, ip-172-21-56-148.compute.internal spec.containers{slotmachine} Warning BackOff Back-off restarting failed container
27m 11s 113 kubelet, ip-172-21-56-148.compute.internal Warning FailedSync Error syncing pod
Note: Checked disk space, CPU, memory on the node running that pod it's fine. According to pod logs, it's not able to connect config service but then other 3 are able to connect to this service so not able to figure it out what is wrong here!
admin@ip-172-21-43-86:~$ kubectl logs -n cog-prod01 slotmachine-1688723297-sk8bn
03:01:02.104 [main] INFO org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Fetching config from server at: http://configservice:8888
03:01:05.344 [main] WARN org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Could not locate PropertySource: I/O error on GET request for "http://configservice:8888/slotmachine/cog,cog-prod01": No route to host (Host unreachable); nested exception is java.net.NoRouteToHostException: No route to host (Host unreachable)
03:01:05.381 [main] INFO org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext - Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@77eca502: startup date [Tue Jan 07 03:01:05 UTC 2020]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@4fb0f2b9
Not enough capacity is available on the node or nodes so scheduler is not able to deploy your 4th pod. You may check this with kubectl describe nodes
. For detailed explanation, have a look at my answer to GKE Insufficient CPU for small Node.js app pods
Check if Kube Proxy is working properly on your nodes.
Here is a guide on debugging Kube Proxy