I have a Node/React application using a kubernetes cluster for deployment and a MySQL backend. I have found that when hitting the same route concurrently, each successive request gets considerably slower, and for some routes where the data load is slightly larger (for example, one page returns 330 MySQL rows), the requests cause the kubernetes pods to crash and the request times out. This is causing server errors and making the site unusable.
Please see below ab testing results with 100 requests (10 concurrent): Percentage of the requests served within a certain time (ms) 50% 711 66% 850 75% 1105 80% 1199 90% 1562 95% 1763 98% 1831 99% 2102 100% 2102 (longest request)
A similar request on a page loading more data: only 10 requests (10 concurrent): The timeout specified has expired (70007)
Can anyone suggest why the pods are crashing and how to prevent this? Honestly, I don't feel that 330 rows of data is an awful lot, even with 10 concurrent requests.
The cluster specs should be suitable: n1-standard-1 (1 vCPU, 3.75 GB memory) Total cores 3 vCPUs Total memory
11.25 GB
What's also odd is that I have 3 pods for each component of the application; frontend, gateway and backend. Only the frontend pods are crashing - I would have thought if anything the backend pods should crash as they're the ones handling the DB calls.
For anyone else who might experience a similar problem, I found that the issue stemmed from a combination of issues:
Although the query took a minuscule amount of time to process (53ms), the results being sent back contained every column of each join entity (i.e. If I queried a table with 3 joins, I'd received 4 tables worth of columns in the response). By itself this is fine, however the frontend and gateway were utilising GraphQL, which seemed unable to cope with processing such a large amount of data concurrently.
The resolution was to re-structure the query such that only the required columns/rows were retrieved (this took the number of rows down to double digits)