I am a Kubernetes newbie. I am running out ideas in solving the Pod status being stuck at ContainerCreating
. I am working on a sample application from AWS (https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html#eks-guestbook), the sample is very similar to the official sample (https://kubernetes.io/docs/tutorials/stateless-application/guestbook/).
Many thanks for anyone giving guidance in finding the root causes:
Why do I get conn refused error, what does port 50051 do? Thanks.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default guestbook-8k9pp 0/1 ContainerCreating 0 15h
default guestbook-b2n49 0/1 ContainerCreating 0 15h
default guestbook-gtjnj 0/1 ContainerCreating 0 15h
default redis-master-rhwnt 0/1 ContainerCreating 0 15h
default redis-slave-b284x 0/1 ContainerCreating 0 15h
default redis-slave-vnlj4 0/1 ContainerCreating 0 15h
kube-system aws-node-jkfg8 0/1 CrashLoopBackOff 273 1d
kube-system aws-node-lpvn9 0/1 CrashLoopBackOff 273 1d
kube-system aws-node-nmwzn 0/1 Error 274 1d
kube-system kube-dns-64b69465b4-ftlm6 0/3 ContainerCreating 0 4d
kube-system kube-proxy-cxdj7 1/1 Running 0 1d
kube-system kube-proxy-g2js4 1/1 Running 0 1d
kube-system kube-proxy-rhq6v 1/1 Running 0 1d
$ kubectl describe pod guestbook-8k9pp
Name: guestbook-8k9pp
Namespace: default
Node: ip-172-31-91-242.ec2.internal/172.31.91.242
Start Time: Wed, 31 Oct 2018 04:59:11 -0800
Labels: app=guestbook
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicationController/guestbook
Containers:
guestbook:
Container ID:
Image: k8s.gcr.io/guestbook:v3
Image ID:
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jb75l (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-jb75l:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jb75l
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 11m (x19561 over 13h) kubelet, ip-172-31-91-242.ec2.internal Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 74s (x19368 over 13h) kubelet, ip-172-31-91-242.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "guestbook-8k9pp_default" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: **desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"**
I had a similar problem. Same error message, but much simpler set of Pods. Using kubectl get pods --all-namespaces
it revealed that one particular node had CrashLoopBackOff.
I scaled-in my nodes, and then scaled-out again (effectively re-creating that node), and that problem seems to have gone away.
The Kubernetes cluster that I created is on AWS EKS. The EKS cluster were created manually by me through the EKS console.
I have created a second cluster with official VPC sample for EKS cluster (https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2018-08-30/amazon-eks-vpc-sample.yaml), and it seems to be working now.
So the problem should be the VPC configurations. Once I figured out what actually went wrong, will post info here, thank you.