I have deployed a Django Rest API container on Amazon EKS (Kubernetes) but a simple HttpResponse took around 4-8 seconds.
def home(request):
return HttpResponse("Homepage")
Here is my stack:
Containers:
NAME READY STATUS RESTARTS AGE
pod/djangoapi-65799dd6dc-gkpfp 1/1 Running 0 1h
pod/echoheaders-5cff747d7d-n4jnt 1/1 Running 0 1h
pod/redis-7d9fbf54cd-lpffv 1/1 Running 0 1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/djangoapi-svc NodePort 10.100.41.163 <none> 8000:30327/TCP 11d
service/echoheaders NodePort 10.100.94.85 <none> 80:30317/TCP 12d
service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 35d
service/redis ClusterIP 10.100.210.207 <none> 6379/TCP 35d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/djangoapi 1 1 1 1 11d
deployment.apps/echoheaders 1 1 1 1 12d
deployment.apps/redis 1 1 1 1 35d
When the same DjangoAPI image was deployed to my local setup (minikube), the average response time is around ~200ms.
Dockerfile for Django:
FROM python:3.7
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
RUN pip install --upgrade pip
RUN mkdir /app
WORKDIR /app
COPY . /app/
RUN pip install -r requirements.txt
Django Deployment & Service yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: djangoapi
type: web
name: djangoapi
namespace: "default"
spec:
replicas: 1
template:
metadata:
labels:
app: djangoapi
type: web
spec:
containers:
- name: djangoapi
image: wbivan/app:speed
imagePullPolicy: Always
args:
- gunicorn
- api.wsgi
- --bind
- 0.0.0.0:8000
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
envFrom:
- configMapRef:
name: djangoapi-config
ports:
- containerPort: 8000
imagePullSecrets:
- name: regcred
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: djangoapi-svc
namespace: "default"
labels:
app: djangoapi
spec:
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: djangoapi
type: web
type: NodePort
Setting and removing resources limits on each deployment.
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
but not much difference was made using the above methods
Then I suspect the Elastic Load-balancer was the problem. Which is why I deployed the echo server (using the image gcr.io/google_containers/echoserver:1.4). But most request only took ~50ms.
# This took ~50ms
- host: echo.mydomain.com
http:
paths:
- path: /*
backend:
serviceName: echoheaders
servicePort: 80
# This took ~ 8000ms!!!
- host: django.mydomain.com
http:
paths:
- path: /*
backend:
serviceName: djangoapi-svc
servicePort: 8000
So clearly the django service was the problem. To verify that, I compared the same deployment on EKS and my local machine (minikube). I tried to isolate my test on just the service itself using kubectl port-forward deployment/djangoapi 7000:8000
What I have noticed is that the response time on my EKS deployment varies a lot. Most of the response took 4-8 seconds, but occasionally it would only take 150ms.
Using the EC2 monitors, all 3 nodes were running at 2.5% CPU utilization, peaks at 8%.
Before this project was dockerized and deployed using Kubernetes, the django code was hosted on a single t2.meduim instance running on ubuntu with Nginx. The average response time was ~300ms. So I'm pretty sure the Django API itself shouldn't be causing the slowdown.
I understand it is unfair to compare a local deployment as the networking is simpler and the resources available might be different (2cpu 2.3GHz i5 Mac, 2GB ram) but this is a startling difference.
Does anyone have a similar experience and any suggestions on how to further debug the situation?