TL;DR
I'm having trouble with an AWS EKS cluster running application pods on top of Fargate serverless nodes.
Here the steps to reproduce it:
default
namespacedefault
namespace python3 -c "bytearray(512000000)"
Here's the pod manifest I'm using, but it can be any pod:
apiVersion: v1
kind: Pod
metadata:
name: root-shell
spec:
containers:
- command: ["sh", "-c"]
args:
- sleep infinity
image: docker.io/library/alpine
# Make two files and pick a different name for the second pod
name: root-shell
resources:
limits:
memory: 64Mi
cpu: 50m
Long description
My issue popped up this week when one of the pods lost network settings completely. After some debugging I figured that the pod was using more memory than it had been assigned to. This caused k8s to OOMKill it and restart the pod. After two or three restarts, the pod lost networking settings. nslookup or other requests to the outside didn't work, while all other pods in the cluster worked just fine. I noticed that pinging to the Fargate node also stopped working, so I'm assuming there might be something going on wrong with pod restart in the same fargate node. If I manually delete the pod, k8s will re-schedule a new pod on a new Fargate node where things work as they are supposed.
any hints are welcome!
I don't have an answer (off the top of my head) for the restart issue (I will look into it and try to recreate the problem). However I wanted to point out that we read the "requests" to size the pod and not the "limits". See here.