Selenium Hub cannot handle test queue when deployed with Kubernetes


I'm currently running in to this exception when I run several tests in parallel against a selenium grid that is deployed using K8s. I have deployed clusters in both AWS and Azure and received the same error. The error occurs when I try to run more tests than there are nodes, I can run a few tests successfully, then after a short amount of time, it will fail all of the remaining tests with this error.

OpenQA.Selenium.WebDriverException : A exception with a null response was thrown sending an HTTP request to the remote WebDriver server for URL The status of the exception was ConnectionClosed, and the message was: The underlying connection was closed: The connection was closed unexpectedly.

I have adjusted timeouts on the selenium hub(browser timeout, timeout, newsesssiontimeout) as well as the command timeouts from the remotewebdriver and nothing changes. I also do not get the error when I test locally.

Here is my current stack.

  • RemoteWebdriver: 3.14.0
  • Selenium Hub : 3.14.0
  • Selenium Node : 3.14.0
  • Chrome : 69
  • C#/Nunit


    [TestCaseSource(typeof(MyFactoryClass), nameof(MyFactoryClass.TestCases))]
    public void ZaleniumTest(int x)
        var caps = new DesiredCapabilities();
        caps.SetCapability("browserName", "chrome");
        var driver =
            new RemoteWebDriver(new Uri(Url), caps, TimeSpan.FromSeconds(1200)) {Url = ""};
        driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(1200);
        var query = driver.FindElement(By.Name("q"));

Here are the commands that I run to deploy the grid:

    kubectl run selenium-hub --image selenium/hub:latest --port 4444
    kubectl expose deployment selenium-hub --type=LoadBalancer
    kubectl run selenium-node-chrome --image selenium/node-chrome:latest --env="HUB_PORT_4444_TCP_ADDR=selenium-hub" --env="HUB_PORT_4444_TCP_PORT=4444"

With this simple grid set up(1 hub 1 chrome node) I try to run 20 tests with the expectation that the tests will be queued up. After about 10 passing tests, the test run will fail and produce that error.

I am looking for the correct places to add a wait or a time out so that we can handle the test queue correctly.

Thanks in advance.

-- akiaki

I was able to locate the issue. It turns out there is a idle timeout setting for the load balancers both in AWS and Azure. It was set to 60 seconds for my case in AWS. I turned the timeout to 3600 seconds and that seemed to fix the problem.

-- akiaki
You didn't say what host you're trying to connect to. Are you sure you're connecting to the load-balancer created by Kubernetes?
Try kubectl expose deployment selenium-hub --type=LoadBalancer --name=my-service followed by kubectl describe services my-service.
That should tell you whether or not there has been a problem setting up the load-balancer, and what IP/host you should actually connect to.

-- samhain1138
