Performance of Fast API with Gunicorn

12/31/2021

I have a fast api server(with gunicorn) inside a docker container deployed on kubernetes. The specs of the pod are as follows:

{
"memory_data": {
        "total": 67447971840,
        "available": 52117856256,
        "percent": 22.7,
        "used": 14726705152,
        "free": 11088379904,
        "active": 19289759744,
        "inactive": 33852702720,
        "buffers": 820379648,
        "cached": 40812507136,
        "shared": 3829760,
        "slab": 2861637632
    },
    "logical_cpu": 16,
    "physical_cpu": 8
}

To load test the cpu performance I have made an api which just iterates over n^3 loop, before returns "ok". The code to be found below:

@api.get("/cpu_loadtest")
def load_test():
    list1 = list()
    for i in range(1000):
        for j in range(100):
            for k in range(50):
                list1.append(i*j*k)
    list1.clear()
    return "ok"

A single request to this endpoint takes around 750 to 850 ms. Since I have 8 physical cores, I have initiated 8 workers of gunicorn in my docker file, expecting I would get atleast 8 rps given the above rate.

But after testing the api I get the following results.

total rps: 11.60, failure rps: 11.40, min response time: 10 secs, max response time: 43 secs

What am I missing here. I don't know whether this is performance issue or gunicorn not able to handle multiple connections.

Anyway, here is my dockerfile.

FROM tiangolo/uwsgi-nginx:python3.7

#update the os
RUN apt-get update

#environment arguments
ARG APP_ENV=local
ENV APP_ENV=$APP_ENV

#make workdir
WORKDIR /app


#Requirements
COPY requirements.txt /tmp/requirements.txt
RUN pip3 install -r /tmp/requirements.txt

# Add app
COPY setup.py /app
# COPY start.sh /app # TODO for later
COPY ./app /app/app



RUN pip install -e .
EXPOSE 4001
CMD [ "gunicorn", "--bind=0.0.0.0:4001", "--workers=8", "--timeout=60", "--threads=1", "--worker-class=uvicorn.workers.UvicornWorker", "app.main:app"]

I checked for 100 requests which gives the following result:

total time taken = 41.945443981000004,
success: 19
fail: 81

Cpu performance:

enter image description here

All the 81 failed errors are 502 Bad GateWay But when I change the method signature from def to async def

async def load_test_async():
list1 = list()
for i in range(1000):
    for j in range(100):
        for k in range(50):
            list1.append(i*j*k)
list1.clear()
return "ok"

It is giving the following results for 100 requests:

total time taken = 40.487646999
success: 100
fail: 0

cpu performance: enter image description here

The difference which I observer in both the case (apart from failing requests) is the in case of async the average running thread is between 3-5, but in case of the earlier one it is between 7-9. Also the of cpu% is more distributed in case of the not async method

-- sohrabkacho
docker
fastapi
gunicorn
kubernetes
performance-testing

0 Answers