Can anyone explain to me why when running my load test on one pod it gives better TPS rather than when scaling to two pods.
I expected that when running the same scenario with the same configuration on 2 pods the TPS will be increased but this is not what happened.
Is this normal behaviour that scaling horizontal not improve the total number of requests?
Please note that I didn't get any failures on one pod just scaled to 2 for high availability.
If you are using any sort of database, this is the place to optimize to increase TPS. Here is why:
Assume your database is running as fast as possible - The pod can handle the network connections and query against the database, but the database is slow because the CPU/Memory/TPS are already maxed; increasing the amount of TPS from pod 1 will not increase your TPS because the DB TPS is the limiting factor.
Now if you add pod 2, your DB still already has maxed out CPU/Memory/TPS, but now it has to use some of the CPU it was using before to complete transactions, to manage DB connections from pod 2 that have to be queued until the CPU/Memory of the DB can handle them; ultimately lowering TPS of the DB and TPS of the entire app.
TLDR: If your DB is maxed, adding pods to make transactions against it lowers the TPS because the DB has to use resources that were actively handling transactions (your limiting factor for TPS) to handle queued DB connections instead.
To fix this, vertically scale your write DB's, horizontally scale your read DB's, optimize DB transactions with index's or better queries, use PGBouncer to manage DB connections, and make sure your DB transaction type is the most efficient for your use case.
It's really depends on what your pod did. As @spencer mentioned. Besides that, there still many factor will impact your expectation:
Based on your case, I guess your pod is not the TPS limiting factor.
Basically increase the replication of pod will not low down the TPS at least.
...my load test on one pod it gives better TPS rather than when scaling to two pods.This can happen when 2 pods race for same resource and create a bottleneck.
Is this normal behaviour that scaling horizontal not improve the total number of requests?Client (web) requests can improve but the capacity for backend, sometimes middleware too (if any), needs to catch up.