TLS handshake timeout when load testing using Gatling

11/23/2017

I currently load testing my service using Gatling in AWS. I did several load tests using HTTP and my service works perfectly. No TLS handshake errors. When we moved to the HTTPS, the load test result showed TLS handshake timeout exception all over the place and finally thrown OOM because unprocessed request getting queued.

Additional information:

  • The Gatling scenario will be like this: Sends three requests and send one request with the connection: close header. I wanted to simulate sending three requests that is kept alive and close it at the end.
  • My service is managed by Kubernetes.

What I have done:

  • I ran the load test on other Gatling instance, but the error still persists
  • Restarted the AWS load balancer. Additional notes: There are no 4xx and 5xx errors, but we have client TLS negotiation errors.

My questions:

  1. Is the error occurred because of the initial handshake required for the HTTPS?
  2. Is the error occurred because of the AWS load balancer?

Thank you.

-- Vincent acent
amazon-web-services
gatling
kubernetes
ssl

2 Answers

11/23/2017

You need to add a SSL debug flag to the client - it will show the nature of the error. A TLS handshake timeout is usually due to cipher/protocol mismatch.

Find out the specific TLS protocol and set of ciphers that the Gatling server is using and make sure your ELB HTTPS listener is using the correct cipher and protocol

From SSL Negotiation Configurations for Classic Load Balancers

Elastic Load Balancing uses a Secure Socket Layer (SSL) negotiation configuration, known as a security policy, to negotiate SSL connections between a client and the load balancer. A security policy is a combination of SSL protocols, SSL ciphers, and the Server Order Preference option. For more information about configuring an SSL connection for your load balancer, see Listeners for Your Classic Load Balancer.

Try to allow all ciphers/protocols there.

-- Rodrigo M
Source: StackOverflow

12/12/2017

So it seems the problem was because the time it took for Gatling to handshake longer than the creation users per second. By decreasing the number of users created and increasing the number RPS solved that.

-- Vincent acent
Source: StackOverflow