Extreme latency in Django API running on EKS

5/29/2019

I have deployed a Django Rest API container on Amazon EKS (Kubernetes) but a simple HttpResponse took around 4-8 seconds.

    def home(request):
    return HttpResponse("Homepage")

Here is my stack:

  • EKS with 3 worker nodes each running on t2.medium (2cpu, 4GB ram)
  • ELB: L7 application load-balancer directing requests to 2 different services
    • echoserver: to test simple response time
    • DjangoAPI
  • Containers:

    • DjangoAPI
    • Redis - cache
    • echoheaders: Simple echo server (gcr.io/google_containers/echoserver:1.4)
    NAME                               READY     STATUS    RESTARTS   AGE
    pod/djangoapi-65799dd6dc-gkpfp     1/1       Running   0          1h
    pod/echoheaders-5cff747d7d-n4jnt   1/1       Running   0          1h
    pod/redis-7d9fbf54cd-lpffv         1/1       Running   0          1h
    
    NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
    service/djangoapi-svc   NodePort    10.100.41.163    <none>        8000:30327/TCP   11d
    service/echoheaders     NodePort    10.100.94.85     <none>        80:30317/TCP     12d
    service/kubernetes      ClusterIP   10.100.0.1       <none>        443/TCP          35d
    service/redis           ClusterIP   10.100.210.207   <none>        6379/TCP         35d
    
    NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/djangoapi     1         1         1            1           11d
    deployment.apps/echoheaders   1         1         1            1           12d
    deployment.apps/redis         1         1         1            1           35d

When the same DjangoAPI image was deployed to my local setup (minikube), the average response time is around ~200ms.

Dockerfile for Django:

FROM python:3.7

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

RUN pip install --upgrade pip

RUN mkdir /app
WORKDIR /app
COPY . /app/
RUN pip install -r requirements.txt

Django Deployment & Service yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: djangoapi
    type: web
  name: djangoapi
  namespace: "default"
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: djangoapi
        type: web
    spec:
      containers:
      - name: djangoapi
        image: wbivan/app:speed
        imagePullPolicy: Always
        args:
        - gunicorn
        - api.wsgi
        - --bind
        - 0.0.0.0:8000
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        envFrom:
        - configMapRef:
            name: djangoapi-config
        ports:
        - containerPort: 8000
      imagePullSecrets:
        - name: regcred
      restartPolicy: Always

---
apiVersion: v1
kind: Service
metadata:
  name: djangoapi-svc
  namespace: "default"
  labels:
    app: djangoapi
spec:
  ports:
  - port: 8000
    protocol: TCP
    targetPort: 8000
  selector:
    app: djangoapi
    type: web
  type: NodePort  

Things I've tried

  1. Adjusting EC2 instance size from t2.small to t2.medium
  2. Adding more DjangoAPI replicas on Kubernetes deployment (from 1 to 3)
  3. Setting and removing resources limits on each deployment.

    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

but not much difference was made using the above methods

  1. Then I suspect the Elastic Load-balancer was the problem. Which is why I deployed the echo server (using the image gcr.io/google_containers/echoserver:1.4). But most request only took ~50ms.

    # This took ~50ms
      - host: echo.mydomain.com
        http:
          paths:
          - path: /*
            backend:
              serviceName: echoheaders
              servicePort: 80
    
    # This took ~ 8000ms!!!
      - host: django.mydomain.com
        http:
          paths:
          - path: /*
            backend:
              serviceName: djangoapi-svc
              servicePort: 8000
  2. So clearly the django service was the problem. To verify that, I compared the same deployment on EKS and my local machine (minikube). I tried to isolate my test on just the service itself using kubectl port-forward deployment/djangoapi 7000:8000

    • Django service on Minikube: ~ 200ms
    • Django service on EKS: 4000-8000 ms

Observations

  1. What I have noticed is that the response time on my EKS deployment varies a lot. Most of the response took 4-8 seconds, but occasionally it would only take 150ms.

  2. Using the EC2 monitors, all 3 nodes were running at 2.5% CPU utilization, peaks at 8%.

  3. Before this project was dockerized and deployed using Kubernetes, the django code was hosted on a single t2.meduim instance running on ubuntu with Nginx. The average response time was ~300ms. So I'm pretty sure the Django API itself shouldn't be causing the slowdown.

I understand it is unfair to compare a local deployment as the networking is simpler and the resources available might be different (2cpu 2.3GHz i5 Mac, 2GB ram) but this is a startling difference.

Does anyone have a similar experience and any suggestions on how to further debug the situation?

-- Ivan
amazon-eks
amazon-web-services
django
kubernetes
latency

0 Answers