I have a fast api server(with gunicorn) inside a docker container deployed on kubernetes. The specs of the pod are as follows:
{
"memory_data": {
"total": 67447971840,
"available": 52117856256,
"percent": 22.7,
"used": 14726705152,
"free": 11088379904,
"active": 19289759744,
"inactive": 33852702720,
"buffers": 820379648,
"cached": 40812507136,
"shared": 3829760,
"slab": 2861637632
},
"logical_cpu": 16,
"physical_cpu": 8
}
To load test the cpu performance I have made an api which just iterates over n^3 loop, before returns "ok". The code to be found below:
@api.get("/cpu_loadtest")
def load_test():
list1 = list()
for i in range(1000):
for j in range(100):
for k in range(50):
list1.append(i*j*k)
list1.clear()
return "ok"
A single request to this endpoint takes around 750 to 850 ms. Since I have 8 physical cores, I have initiated 8 workers of gunicorn in my docker file, expecting I would get atleast 8 rps given the above rate.
But after testing the api I get the following results.
total rps: 11.60, failure rps: 11.40, min response time: 10 secs, max response time: 43 secs
What am I missing here. I don't know whether this is performance issue or gunicorn not able to handle multiple connections.
Anyway, here is my dockerfile.
FROM tiangolo/uwsgi-nginx:python3.7
#update the os
RUN apt-get update
#environment arguments
ARG APP_ENV=local
ENV APP_ENV=$APP_ENV
#make workdir
WORKDIR /app
#Requirements
COPY requirements.txt /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt
# Add app
COPY setup.py /app
# COPY start.sh /app # TODO for later
COPY ./app /app/app
RUN pip install -e .
EXPOSE 4001
CMD [ "gunicorn", "--bind=0.0.0.0:4001", "--workers=8", "--timeout=60", "--threads=1", "--worker-class=uvicorn.workers.UvicornWorker", "app.main:app"]
I checked for 100 requests which gives the following result:
total time taken = 41.945443981000004,
success: 19
fail: 81
Cpu performance:
All the 81 failed errors are 502 Bad GateWay But when I change the method signature from def to async def
async def load_test_async():
list1 = list()
for i in range(1000):
for j in range(100):
for k in range(50):
list1.append(i*j*k)
list1.clear()
return "ok"
It is giving the following results for 100 requests:
total time taken = 40.487646999
success: 100
fail: 0
The difference which I observer in both the case (apart from failing requests) is the in case of async the average running thread is between 3-5, but in case of the earlier one it is between 7-9. Also the of cpu% is more distributed in case of the not async method