I am running GKE cluster with two node pool.
1st node pool: 1 node (No auto scaling)(4 vCPU, 16 GB RAM)
2nd node pool: 1 node (Auto scaling to 2 node) (1 vCPU, 3.75 GB RAM)
here : kubectl top node
we started cluster with a single node running Elasticsearch, Redis, RabbitMQ and all micro service on single node. we can not add more node in 1st node pool as it will be wasting of resources. 1st node can satisfy all resource requirements.
We are facing POD restarting for only one microservice.
core service pod is only restarting. when tried to describe pod
it's ERROR 137 terminated
.
In GKE stack drive graph Memory
and CPU
is not reaching to limit.
All pods in cluster utilization
In cluster log I have found this warning :
0/3 nodes are available: 3 Insufficient CPU.
but here it's 3 nodes total CPU around 6 vCPU which is more than enough.
Also this error
Memory cgroup out of memory: Kill process 3383411 (python3) score 2046 or sacrifice child Killed process 3384902 (python3) total-vm:14356kB, anon-rss:5688kB, file-rss:4572kB, shmem-rss:0kB
EDIT : 1
Name: test-core-7fc8bbcb4c-vrbtw
Namespace: default
Priority: 0
Node: gke-test-cluster-highmem-pool-gen2-f2743e02-msv2/10.128.0.7
Start Time: Fri, 17 Jan 2020 19:59:54 +0530
Labels: app=test-core
pod-template-hash=7fc8bbcb4c
tier=frontend
Annotations: <none>
Status: Running
IP: 10.40.0.41
IPs: <none>
Controlled By: ReplicaSet/test-core-7fc8bbcb4c
Containers:
test-core:
Container ID: docker://0cc49c15ed852e99361590ee421a9193e10e7740b7373450174f549e9ba1d7b5
Image: gcr.io/test-production/core/production:fc30db4
Image ID: docker-pullable://gcr.io/test-production/core/production@sha256:b5dsd03b57sdfsa6035ff5ba9735984c3aa714bb4c9bb92f998ce0392ae31d055fe
Ports: 9595/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Sun, 19 Jan 2020 14:54:52 +0530
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sun, 19 Jan 2020 07:36:42 +0530
Finished: Sun, 19 Jan 2020 14:54:51 +0530
Ready: True
Restart Count: 7
Limits:
cpu: 990m
memory: 1Gi
Requests:
cpu: 200m
memory: 128Mi
Liveness: http-get http://:9595/k8/liveness delay=25s timeout=5s period=5s #success=1 #failure=30
Readiness: http-get http://:9595/k8/readiness delay=25s timeout=8s period=5s #success=1 #failure=30
Environment Variables from:
test-secret Secret Optional: false
core-staging-configmap ConfigMap Optional: false
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-hcz6d:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hcz6d
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Please help. Thankyou in advance.
The application running in the pod may be consuming more memory than the specified limits. You can docker exec / kubectl exec into the container and monitor the applications using top. But from perspective of managing the whole cluster, we do it using cadvisor (which is part of Kubelet) + Heapster. But now Heapster is replaced by kube-metric server (https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring)